How do I get Perl to read the values of my html form as unicode?

advertisements

I have an html form that sends data to .cgi page. Here is the html:

<HTML>

<BODY BGCOLOR="#FFFFFF">

    <FORM METHOD="post" ACTION="test.cgi">

        <B>Write to me below:</B><P>
        <TEXTAREA NAME="feedback" ROWS=10 COLS=50></TEXTAREA><P>

        <CENTER>
            <INPUT TYPE=submit VALUE="SEND">
            <INPUT TYPE=reset VALUE="CLEAR">
        </CENTER>

    </FORM>

</BODY>
</HTML>

Here is the perl script for test.cgi:

#!/usr/bin/perl

use utf8;
use encoding('utf8');
require Encode;
require CGI;

# The following accepts the data from the form and puts it in %FORM

if ($ENV{'REQUEST_METHOD'} eq 'POST') {
    read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});

    @pairs = split(/&/, $buffer);

    foreach $pair (@pairs) {
        ($name, $value) = split(/=/, $pair);
        $value =~ tr/+/ /;
        $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;

    $FORM{$name} = $value;
    }

# The following generates the html for the page

    print "Content-type: text/html\n\n";
    print "<HTML>\n";
    print "<HEAD>\n";
    print "<TITLE>Thank You!</TITLE>\n";
    print "</HEAD>\n";
    print "<BODY BGCOLOR=#FFFFCC TEXT=#000000>\n";
    print "<H1>Thank You!</H1>\n";
    print "<P>\n";
    print "<H3>Your feedback is greatly appreciated.</h3><BR>\n";
    print "<P>\n<P>\n";
    print "The user wrote:\n\n";
    print "<P>\n";

# This is print statement A
    print "$FORM{'feedback'}<br>\n";

    $FORM{'feedback'}=~s/(\w)/ $1/g;

# This is print statement B
    print "$FORM{'feedback'}\n";

    print "</BODY>\n";
    print "</HTML>\n";
    exit(0);
}

This all works the way it's supposed to if the user enters English text. However, this will eventually be used in a product where the user will enter Chinese text. So here's an example of the problem. If the user enters "中文" into the form, then Print Statement A prints "中文." However, Print Statement B (which prints $value after the regex has been run) prints " 2 0 0 1 3; 2 5 9 9 1; ". What I want it to print however is "中文". If you want to see this, go to http://thedeandp.com/chinese/input.html and try it yourself.

So basically, what I've figured out is that when perl reads in the form, it's just treating each byte as a character, so the regex adds a space between each byte. Chinese characters use unicode, so it's multiple bytes to a character. That means the regex breaks up the unicode with a space between the bytes, and that is what produces the output seen in Print Statement B. I've tried methods like $value = Encode::decode_utf8($value) to get perl to treat it as unicode, but nothing has worked so far.

That CGI style could be improved while fixing your encoding decoding issue. Try this–

use strict;
use warnings;
use Encode;
use CGI ":standard";
use HTML::Entities;

print
    header("text/html; charset=utf-8"),
    start_html("Thank you!"),
    h1("Thank You!"),
    h3("Your feedback is greatly appreciated.");

if ( my $feedback = decode_utf8( param("feedback") ) )
{
    print
        p("The user wrote:"),
        blockquote( encode_utf8( encode_entities($feedback) ) );
}

print end_html();

Proper encoding and decoding between octets/bytes and utf-8 is necessary to avoid surprises and allow the Perl to behave as you’d expect.

For example, you can drop this in–

    h4("Which capitalizes as:"),
    blockquote( encode_utf8( uc $feedback ) );

And see character conversions work like so: å™ç∂®r£ ➟ Å™Ç∂®R£

Update: added encode_entities. NEVER print user input back without escaping the HTML. Update to update: which actually will end up escaping the utf-8 depending on the setup (you can have it only escape ['"<>] for example)…

How do I get Perl to read the values of my html form as unicode?

How do I get Perl to read the values of my html form as unicode?

Recommend

企业微信为什么不能添加外部联系人？如何开启外部联系人权限？

OpenGL (ES) & mdash; Polygons temporarily disappear when new objects are add...

Unable to receive json data in controller with knockout

Convert a Long to base 36 to scala

包政：确定经营领域

企业微信要收费吗？企业微信收费的标准是什么？

Find the fastest way to go from Web page A to Web page B Next links

How do I perform an insert-based algorithm on an array of objects based on a Str...

抖音账号内容自检清单！

警惕人生中的资源诅咒

About Joyk

How do I get Perl to read the values ​​of my html form as unicode?

How do I get Perl to read the values ​​of my html form as unicode?

Recommend

About Joyk

How do I get Perl to read the values of my html form as unicode?

How do I get Perl to read the values of my html form as unicode?