When not to use a regex
source link: https://drewdevault.com/2017/08/13/When-not-to-use-a-regex.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
When not to use a regex August 13, 2017 on Drew DeVault's blog
The other day, I saw Learn regex the easy way. This is a great resource, but I felt the need to pen a post explaining that regexes are usually not the right approach.
Let’s do a little exercise. I googled “URL regex” and here’s the first Stack Overflow result:
https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&//=]*)
This is a bad regex. Here are some valid URLs that this regex fails to match:
Here are some invalid URLs the regex is fine with:
This answer has been revised 9 times on Stack Overflow, and this is the best they could come up with. Go back and read the regex. Can you tell where each of these bugs are? How long did it take you? If you received a bug report in your application because one of these URLs was handled incorrectly, do you understand this regex well enough to fix it? If your application has a URL regex, go find it and see how it fares with these tests.
Complicated regexes are opaque, unmaintainable, and often wrong. The correct approach to validating a URL is as follows:
from urllib.parse import urlparse
def is_url_valid(url):
try:
urlparse(url)
return True
except:
return False
A regex is useful for validating simple patterns and for finding patterns in text. For anything beyond that it’s almost certainly a terrible choice. Say you want to…
validate an email address: try to send an email to it!
validate password strength requirements: estimate the complexity with zxcvbn!
validate a date: use your standard library! datetime.datetime.strptime
validate a credit card number: run the Luhn algorithm on it!
validate a social security number: alright, use a regex. But don’t expect the number to be assigned to someone until you ask the Social Security Administration about it!
Get the picture?
Have a comment on one of my posts? Start a discussion in my public inbox by sending an email to ~sircmpwn/[email protected] [mailing list etiquette]
Articles from blogs I read Generated by openring
Go on ARM and Beyond
The industry is abuzz about non-x86 processors recently, so we thought it would be worth a brief post about Go’s support for them. It has always been important to us for Go to be portable, not overfitting to any particular operating sys…
via The Go Programming Language Blog December 17, 2020Status update, December 2020
Hi all! This status update is the 24th one, so it’s been 2 years I’ve started writing those now (ignoring a little hiatus). Time flies! This month I’ve invested a lot of time into wlroots. My main focus has been renderer v6, which has now been internally rol…
via emersion December 16, 2020What's cooking on Sourcehut? December 2020
A brisk wind of winter chill sets a stir down my spine, as I sit down with a fresh cup of coffee to yarn a story of careful engineering and passionate spirit that took place over the course of 30 days. The last 30 days. Cause this is the monthly “what’s cook…
via Blogs on Sourcehut December 15, 2020Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK