Invalidate form input that contains HTML
source link: https://www.codesd.com/item/invalidate-form-input-that-contains-html.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Invalidate form input that contains HTML
I'm using GoogleAppEngine with Python runtime and I have a very simple contact form. How do I invalidate submissions where a field contains HTML?
Try something like this, use the input from the field and populate it into a sting then you can remove the HTML tags from that string like so:
This function can strip the HTML for you and is nice, as it requires only the Python stdlib:
On Python 2
from HTMLParser import HTMLParser
class MLStripper(HTMLParser):
def __init__(self):
self.reset()
self.fed = []
def handle_data(self, d):
self.fed.append(d)
def get_data(self):
return ''.join(self.fed)
def strip_tags(html):
s = MLStripper()
s.feed(html)
return s.get_data()
For Python 3
from html.parser import HTMLParser
class MLStripper(HTMLParser):
def __init__(self):
self.reset()
self.strict = False
self.convert_charrefs= True
self.fed = []
def handle_data(self, d):
self.fed.append(d)
def get_data(self):
return ''.join(self.fed)
def strip_tags(html):
s = MLStripper()
s.feed(html)
return s.get_data()
Another way is to intersect start and end tags found by attempting to parse the text as HTML and intersecting this set with a known set of acceptable HTML elements.
#!/usr/bin/env python
from __future__ import print_function
from HTMLParser import HTMLParser
from html5lib.sanitizer import HTMLSanitizerMixin
class TestHTMLParser(HTMLParser):
def __init__(self, *args, **kwargs):
HTMLParser.__init__(self, *args, **kwargs)
self.elements = set()
def handle_starttag(self, tag, attrs):
self.elements.add(tag)
def handle_endtag(self, tag):
self.elements.add(tag)
def is_html(text):
elements = set(HTMLSanitizerMixin.acceptable_elements)
parser = TestHTMLParser()
parser.feed(text)
return True if parser.elements.intersection(elements) else False
print(is_html("foo bar"))
print(is_html("<p>Hello World!</p>"))
print(is_html("<html><head><title>Title</title></head><body><p>Hello!</p></body></html>")) # noqa
Output:
$ python foo.py
False
True
True
You can then validate the submission according to the true/false value returned. You will have to implement you own logic and process of course
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK