Python regex to strip emoji from a string
source link: https://gist.github.com/Alex-Just/e86110836f3f93fe7932290526529cd1
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Python regex to strip emoji from a string · GitHub
Instantly share code, notes, and snippets.
Does not work when the emoji is at the end of a sentence.
Thanks a lot
It works for me
Thanks, works very well
In this question on stackoverflow, an user said that this function doesn't cover all emojis, so it is better to use:
def strip_emoji(text):
RE_EMOJI = re.compile(u'([\U00002600-\U000027BF])|([\U0001f300-\U0001f64F])|([\U0001f680-\U0001f6FF])')
return RE_EMOJI.sub(r'', text)
for the record, this is the pattern we are using
# https://en.wikipedia.org/wiki/Unicode_block
EMOJI_PATTERN = re.compile(
"["
"\U0001F1E0-\U0001F1FF" # flags (iOS)
"\U0001F300-\U0001F5FF" # symbols & pictographs
"\U0001F600-\U0001F64F" # emoticons
"\U0001F680-\U0001F6FF" # transport & map symbols
"\U0001F700-\U0001F77F" # alchemical symbols
"\U0001F780-\U0001F7FF" # Geometric Shapes Extended
"\U0001F800-\U0001F8FF" # Supplemental Arrows-C
"\U0001F900-\U0001F9FF" # Supplemental Symbols and Pictographs
"\U0001FA00-\U0001FA6F" # Chess Symbols
"\U0001FA70-\U0001FAFF" # Symbols and Pictographs Extended-A
"\U00002702-\U000027B0" # Dingbats
"\U000024C2-\U0001F251"
"]+"
)```
mghayour commented on Apr 1, 2020 •
@mgaitan it works perfectly for me, thanks a lot
def add_space_between_emojies(text):
# Ref: https://gist.github.com/Alex-Just/e86110836f3f93fe7932290526529cd1#gistcomment-3208085
# Ref: https://en.wikipedia.org/wiki/Unicode_block
EMOJI_PATTERN = re.compile(
"(["
"\U0001F1E0-\U0001F1FF" # flags (iOS)
"\U0001F300-\U0001F5FF" # symbols & pictographs
"\U0001F600-\U0001F64F" # emoticons
"\U0001F680-\U0001F6FF" # transport & map symbols
"\U0001F700-\U0001F77F" # alchemical symbols
"\U0001F780-\U0001F7FF" # Geometric Shapes Extended
"\U0001F800-\U0001F8FF" # Supplemental Arrows-C
"\U0001F900-\U0001F9FF" # Supplemental Symbols and Pictographs
"\U0001FA00-\U0001FA6F" # Chess Symbols
"\U0001FA70-\U0001FAFF" # Symbols and Pictographs Extended-A
"\U00002702-\U000027B0" # Dingbats
"])"
)
text = re.sub(EMOJI_PATTERN, r' \1 ', text)
return text
EDIT:
i deleted last one "\U000024C2-\U0001F251"
, because it matches persian characters, that makes bug for me
hello, I credited your work for a workaround in a youtube-dl issue:
ytdl-org/youtube-dl#5042 (comment)
it has helped a lot, thank you.
Shellbye commented on Jun 9, 2020 •
In case someone like has from __future__ import unicode_literals
at the top, then you need to escape "-" like this:
EMOJI_PATTERN = re.compile(
"["
"\U0001F1E0-\U0001F1FF" # flags (iOS)
"\U0001F300-\U0001F5FF" # symbols & pictographs
"\U0001F600-\U0001F64F" # emoticons
"\U0001F680-\U0001F6FF" # transport & map symbols
"\U0001F700-\U0001F77F" # alchemical symbols
"\U0001F780-\U0001F7FF" # Geometric Shapes Extended
"\U0001F800-\U0001F8FF" # Supplemental Arrows-C
"\U0001F900-\U0001F9FF" # Supplemental Symbols and Pictographs
"\U0001FA00-\U0001FA6F" # Chess Symbols
"\U0001FA70-\U0001FAFF" # Symbols and Pictographs Extended-A
"\U00002702-\U000027B0" # Dingbats
"\U000024C2-\U0001F251"
"]+"
)
or you will got a bad character range
like in this SO
Lakril commented on Apr 7, 2021 •
Thanks for you help.
def add_space_between_emojies(text):
'''
>>> add_space_between_emojies('Python is fun 💚')
'Python is fun '
'''
from advertools.emoji import EMOJI
EMOJI_PATTERN = EMOJI
text = re.sub(EMOJI_PATTERN, r'', text)
return text
Sorry to say this but I think @mgaitan's regex is not perfect.
The recent emoji character includes various combinations and patterns so it would be more complex expression.
And this would be good implementation example by javascript: https://github.com/mathiasbynens/emoji-regex
@clichedmoog you are totally right, everything here is a simplification
. For a complete/accurate emoji remover for python I recommend the library https://github.com/bsolomon1124/demoji which download the latest emoji specification to build the pattern. It's not super fast but it's exhaustive.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK