12

Invalidate form input that contains HTML

 3 years ago
source link: https://www.codesd.com/item/invalidate-form-input-that-contains-html.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Invalidate form input that contains HTML

advertisements

I'm using GoogleAppEngine with Python runtime and I have a very simple contact form. How do I invalidate submissions where a field contains HTML?


Try something like this, use the input from the field and populate it into a sting then you can remove the HTML tags from that string like so:

This function can strip the HTML for you and is nice, as it requires only the Python stdlib:

On Python 2

from HTMLParser import HTMLParser

class MLStripper(HTMLParser):
    def __init__(self):
        self.reset()
        self.fed = []
    def handle_data(self, d):
        self.fed.append(d)
    def get_data(self):
        return ''.join(self.fed)

def strip_tags(html):
    s = MLStripper()
    s.feed(html)
    return s.get_data()

For Python 3

from html.parser import HTMLParser

    class MLStripper(HTMLParser):
        def __init__(self):
        self.reset()
        self.strict = False
        self.convert_charrefs= True
        self.fed = []
    def handle_data(self, d):
        self.fed.append(d)
    def get_data(self):
        return ''.join(self.fed)

def strip_tags(html):
    s = MLStripper()
    s.feed(html)
    return s.get_data()

Another way is to intersect start and end tags found by attempting to parse the text as HTML and intersecting this set with a known set of acceptable HTML elements.

#!/usr/bin/env python

from __future__ import print_function

from HTMLParser import HTMLParser

from html5lib.sanitizer import HTMLSanitizerMixin

class TestHTMLParser(HTMLParser):

    def __init__(self, *args, **kwargs):
        HTMLParser.__init__(self, *args, **kwargs)

        self.elements = set()

    def handle_starttag(self, tag, attrs):
        self.elements.add(tag)

    def handle_endtag(self, tag):
        self.elements.add(tag)

def is_html(text):
    elements = set(HTMLSanitizerMixin.acceptable_elements)

    parser = TestHTMLParser()
    parser.feed(text)

    return True if parser.elements.intersection(elements) else False

print(is_html("foo bar"))
print(is_html("<p>Hello World!</p>"))
print(is_html("<html><head><title>Title</title></head><body><p>Hello!</p></body></html>"))  # noqa

Output:

$ python foo.py
False
True
True

You can then validate the submission according to the true/false value returned. You will have to implement you own logic and process of course


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK