When working on a JavaScript project with AngularJS 1.6, I have a list of strings which I'd like to filter. For instance, assume my list contains árbol, cigüeña, nido and tubo.
When filtering strings in Spanish, if I filtered for "u", I'd expect both cigüeña and tubo to appear, which would be the most natural result for a Spaniard. However, this is not the case in German - u and ü are different letters and thus a German will not want to see cigüeña on the list. So I am looking for a way to make my list filtering aware of the user's locale.
I happen to have an object containing lots of diacritics, such that:
diacritics["á"] = "a"; diacritics["ü"] = "u"; // and so on...
This is what my filtering code looks like:
function matches(word, search) { var cleanWord = removeDiacritics(word.toLowerCase()); var cleanSearch = removeDiacritics(search.toLowerCase()); return cleanWord.indexOf(cleanSearch) > -1; } function removeDiacritics(word) { function match(a) { return diacritics[a] || a; } return text.replace(/[^\u0000-\u007E]/g, match); }
The above code just removes all diacritics, so I thought to make it aware of the user's locale. Thus, I changed the match() function to this:
function match(a) { if (diacritics[a] && a.localeCompare(diacritics[a] === 0) { return diacritics[a]; } return a; }
Unfortunately, this doesn't work. The localeCompare function returns the same values when comparing "u" and "ü" with the German and Spanish locales, so that was not the answer here. I've gone over the reference for the localeCompare method and tried the usage and sensitivity options, but they don't seem to help much here.
How could I tweak my code for this to work? Is there any library which can handle this properly for me?
2 Answers
Answers 1
I'd go about getting the user's locale directly from the browser via navigator
(src), an object representing the user agent:
var language = navigator.language;
This will assign language
the locale code of the user's browser, in my case en-US
. I found this site helpful for finding locale code's to test other regions of the world.
My strFromLocale
function is comparable to your removeDiacritics
function:
function strFromLocale(str) { function match(letter) { function letterMatch(letter, normalizedLetter) { var location = new Intl.Collator(language, {usage: 'search', sensitivity: 'base' }).compare(letter, normalizedLetter); return (location == 0) } normalizedLetter = letter.normalize('NFD').replace(/[\u0300-\u036f]/gi, "") if ( letterMatch(letter, normalizedLetter) ) { return normalizedLetter; } else { return letter; } } return str.replace(/[^\u0000-\u007E]/g, match); }
Note the line with Intl.Collator
(src). This line compares the diacritic with the normalized letter of the diacritic and checks the given language's alphabet for positional differences. Therefore:
/* English */ new Intl.Collator('en-US', {usage: 'search', sensitivity: 'base' }).compare('u', 'ü'); >>> 0 /* Swedish */ new Intl.Collator('sv', {usage: 'search', sensitivity: 'base' }).compare('u', 'ü'); >>> -1 /* German */ new Intl.Collator('de', {usage: 'search', sensitivity: 'base' }).compare('u', 'ü'); >>> -1
As you can see in the letterMatch
function, it returns true if and only if the result of Intl.Collator
is 0
, indicating that there are no positional differences of the letter within the alphabet of that language meaning it is safe to replace.
With that, here are some tests of the strFromLocale
function:
var language = navigator.language; // en-US strFromLocale("cigüeña"); >>> ciguena var language = 'sv' // Swedish strFromLocale("cigüeña"); >>> cigüena var language = 'de' // German strFromLocale("cigüeña"); >>> cigüena var language = 'es-mx' // Spanish - Mexico strFromLocale("cigüeña"); >>> cigueña
Answers 2
You are probably looking for the ECMA 6 Intl library. This will allow you to adjust sort order based on locale e.g.:
// in German, ä sorts with a console.log(new Intl.Collator('de').compare('ä', 'z')); // → a negative value // in Swedish, ä sorts after z console.log(new Intl.Collator('sv').compare('ä', 'z')); // → a positive value
The sensitivity: 'base'
option will automatically sort with/without diacritics.
// in German, ä has a as the base letter console.log(new Intl.Collator('de', { sensitivity: 'base' }).compare('ä', 'a')); // → 0 // in Swedish, ä and a are separate base letters console.log(new Intl.Collator('sv', { sensitivity: 'base' }).compare('ä', 'a')); // → a positive value
You can then sort your list into the correct order prior to populating your UI Widget.
0 comments:
Post a Comment