20

Remove accents and filter non-alphanumeric characters

 2 years ago
source link: https://www.codesd.com/item/remove-accents-and-filter-non-alphanumeric-characters.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Remove accents and filter non-alphanumeric characters

advertisements

How can i filter the non alphanumerical characters out of a string, but keep the accents untouched ?

example:

$string = "présentation d'un texte, avec des accents (en français!) & autres..."

Should be replaced by:

$string = "présentation dun texte avec des accents en français  autres"


You can try this regex:

$str = "présentation d'un texte, avec des accents (en français!) & autres...";
echo preg_replace('/[^\p{L}\s\p{N}]+/u', '', $str);
//=> présentation dun texte avec des accents en français  autres

Character class [^\p{L}\s\p{N}]+ means match 1 or more of characters are NOT:

  1. unicode letters
  2. unicode digits
  3. space

/u flag is for matching unicode letters.

Related Articles

How do I remove advanced and newer non-alphanumeric characters in PHP?

I'm looking to "trim" non-alphanumerics from a string, similar to how trim() works with whitespace. Help me convert #str|ng# to str|ng. I can remove trailing non-alphanumerics with: $string = preg_replace('/\W+$/', '', $string); // converts `#st

Trim T-SQL & amp; nbsp (and other non-alphanumeric characters)

We have some input data that sometimes appears with &nbsp characters on the end. The data comes in from the source system as varchar() and our attempts to cast as decimal fail b/c of these characters. Ltrim and Rtrim don't remove the characters, so w

How can I remove copyright and other non-ASCII characters from my Java channel?

I'm using Java 6 (not an option to upgrade at this time). I have a Java string that contains the following value: My Product Edition 2014© The last symbol is a copyright symbol (©). When this string outputs to my terminal (using bash on Mac 10.9.5),

Remove spaces between non-alphanumeric characters

How do I remove whitespaces in between non-alphanumeric characters? For example anti - C6 / 36 membrane antibodies D2 NS1 - P1 - specific antibodies To anti-C6/36 membrane antibodies D2 NS1-P1-specific antibodies You can use this lookaround based reg

php regex removes all non-alphanumeric characters and spaces from a string

I need a regex to remove all non-alphanumeric and space characters, I have this $page_title = preg_replace("/[^A-Za-z0-9 ]/", "", $page_title); but it doesn't remove space characters and replaces some non-alphanumeric characters with n

Is there an easier way to remove non-alphanumeric characters and replace spaces?

I would like to replace all non-alphanumeric characters, and replace spaces with underscores. So far I've come up with this using multiple regex which works but is there a more 'efficient' way? "Well Done!".toLowerCase().replace(/\s/, '-').repla

Removing non-alphanumeric characters from file names and renaming to Python

I'm trying to get rid of non alphanumeric characters within a source folder and rename any files with non-alphanumeric characters to versions without by using this code. However every time I run the module I get this error, Traceback (most recent cal

Java Regex - Removes non-alphanumeric characters except line breaks

I'm trying to remove all the non-alphanumeric characters from a String in Java but keep the carriage returns. I have the following regular expression, but it keeps joining words before and after a line break. [^\\p{Alnum}\\s] How would I be able to p

Remove non-alphanumeric characters by substituting Regex

I have this code and I want to remove the non-alphanumeric characters. The problem is it removes the Arabic words as well. How can i keep Arabic characters and remove just the non alphanumeric characters. # -*- coding: utf-8 -*- import re hello = u"س

Removing non-alphanumeric characters from an ordered collection of objects (list) in R

I have a question about removing non-alphanumeric characters from a list in R. I have a list will all sorts of odd characters, blanks, etc. and would like to remove them. I'm generally able to remove what I want using the tm package in r. I fiddled a

Regular Java expression to remove all non-alphanumeric characters EXCEPT spaces

I'm trying to write a regular expression in Java which removes all non-alphanumeric characters from a paragraph, except the spaces between the words. This is the code I've written: paragraphInformation = paragraphInformation.replaceAll("[^a-zA-Z0-9\s

A split set of characters in alphanumeric and non-alphanumeric characters

I'm just a student and I want to know about this array in c++. How can I display all alphanumeric chars inputted on array k to array n and all non-alphanumeric on array t? This what I made, and I don't know what's next int main(int argc, char *argv[]

Java substring method: How to delete all non-alphanumeric characters?

I would like to remove all non-alphanumerical characters from this string, and other types of strings like this one Unable_to_locate_element_{"method""link_text","selector""ikljbhfvdesiofsdjkl"} So I can use this st

How to translate non-alphanumeric characters in a string in XSLT v1.0

I use XSLT v1.0 and want to translate "UÅ1atx-ß3Å" to "UAA1atx-SS3AA" which "Å" = 'AA' and "ß" = "SS", but so far no luck. I only can change lower-case to upper-case and remove none alphanumeric characters

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK