Showing posts with label locale. Show all posts
Showing posts with label locale. Show all posts

Monday, August 27, 2018

Custom locale configuration for float conversion

Leave a Comment

I need to convert a string in the format "1.234.345,00" to the float value 1234345.00.

One way is to use repeated str.replace:

x = "1.234.345,00" res = float(x.replace('.', '').replace(',', '.'))  print(res, type(res)) 1234345.0 <class 'float'> 

However, this appears manual and non-generalised. This heavily upvoted answer suggests using the locale library. But my default locale doesn't have the same conventions as my input string. I then discovered a way to extract the characters used in local conventions as a dictionary:

import locale  print(locale.localeconv())  {'int_curr_symbol': '', 'currency_symbol': '', 'mon_decimal_point': '',  ..., 'decimal_point': '.', 'thousands_sep': '', 'grouping': []} 

Is there a way to update this dictionary, save as a custom locale and then be able to call this custom locale going forwards. Something like:

mylocale = locale.create_new_locale()  # "blank" conventions or copied from default mylocale.localeconv()['thousands_sep'] = '.' mylocale.localeconv()['decimal_point'] = ','  setlocale(LC_NUMERIC, mylocale) atof('123.456,78')  # 123456.78 

If this isn't possible, how do we get a list of all available locale and their conventions? Seems anti-pattern to "deduce" the correct configuration from the conventions (not to mention inefficient / manual), so I was hoping for a generic solution such as above pseudo-code.


Edit: Here's my attempt at finding all locales where thousands_sep == '.' and decimal_point == ','. In fact, more generally, to group locales by combinations of these parameters:

import locale from collections import defaultdict  d = defaultdict(list)  for alias in locale.locale_alias:     locale.setlocale(locale.LC_ALL, alias)     env = locale.localeconv()     d[(env['thousands_sep'], env['decimal_point'])].append(alias) 

Result:

--------------------------------------------------------------------------- Error                                     Traceback (most recent call last) <ipython-input-164-f8f6a6db7637> in <module>()       5        6 for alias in locale.locale_alias: ----> 7     locale.setlocale(locale.LC_ALL, alias)       8     env = locale.localeconv()       9     d[(env['thousands_sep'], env['decimal_point'])].append(alias)  C:\Program Files\Anaconda3\lib\locale.py in setlocale(category, locale)     596         # convert to string     597         locale = normalize(_build_localename(locale)) --> 598     return _setlocale(category, locale)     599      600 def resetlocale(category=LC_ALL):  Error: unsupported locale setting 

3 Answers

Answers 1

If you pop open the source code for locale, you can see that there is a variable called _override_localeconv (which seems to be for testing purposes).

# With this dict, you can override some items of localeconv's return value. # This is useful for testing purposes. _override_localeconv = {} 

Trying the following does seem to override the dictionary without changing the entire locale, though it probably has some unintended consequences, especially since changing locales isn't threadsafe. Be careful!

import locale  locale._override_localeconv["thousands_sep"] = "." locale._override_localeconv["decimal_point"] = ","  print locale.atof('123.456,78') 

Try it online!

Answers 2

Here's something, using Babel, that works for me.

First you feed it some test data, with your expectations and it builds a dictionary of separator to locale alias that fits.

Then you can convert from that point on.

import string from decimal import Decimal from babel.numbers import parse_decimal, NumberFormatError from babel.core import UnknownLocaleError import locale  traindata = [     ("1.234.345,00", Decimal("1234345.00")),     ("1,234,345.00", Decimal("1234345.00")),     ("345", Decimal("345.00")), ]  data = traindata + [     ("345,00", Decimal("345.00")),     ("345.00", Decimal("345.00")),     ("746", Decimal("746.00")), ]  def findseps(input_):     #you need to have no separator      #or at least a decimal separator for this to work...      seps = [c for c in input_ if not c in string.digits]     if not seps:         return ""      sep = seps[-1]     #if the decimal is something then thousand will be the other...     seps = "." + sep if sep == "," else "," + sep     return seps    def setup(input_, exp, lookup):       key = findseps(input_)      if key in lookup:         return      for alias in locale.locale_alias:         #print(alias)          try:             got = parse_decimal(input_, locale=alias)         except (NumberFormatError,UnknownLocaleError, ValueError) as e:             continue         except (Exception,) as e:             raise         if exp == got:             lookup[key] = alias             return   def convert(input_, lookup):     seps = findseps(input_)     try:         locale_ = lookup[seps]         convert.locale_ = locale_     except (KeyError,) as e:         convert.locale_ = None         return "unexpected seps:%s" % seps      try:         return parse_decimal(input_, locale=locale_)     except (Exception,) as e:         return e   lookup = {}  #train your data for input_, exp in traindata:     setup(input_, exp, lookup)  #once it's trained you know which locales to use print(data)   for input_, exp in data:     got = convert(input_, lookup)      # print (input_)     msg = "%s => %s with local:%s:" % (input_, got, convert.locale_)     if exp == got:         print("\n  success : " + msg)     else:         print("\n  failure : " + msg)  print(lookup) 

output:

[('1.234.345,00', Decimal('1234345.00')), ('1,234,345.00', Decimal('1234345.00')), ('345', Decimal('345.00')), ('345,00', Decimal('345.00')), ('345.00', Decimal('345.00')), ('746', Decimal('746.00'))]    success : 1.234.345,00 => 1234345.00 with local:is_is:    success : 1,234,345.00 => 1234345.00 with local:ko_kr.euc:    success : 345 => 345 with local:ko_kr.euc:    success : 345,00 => 345.00 with local:is_is:    success : 345.00 => 345.00 with local:ko_kr.euc:    success : 746 => 746 with local:ko_kr.euc: {',.': 'ko_kr.euc', '': 'ko_kr.euc', '.,': 'is_is'} 

Answers 3

There are two parts in your question:

  1. How can I parse '1.234.345,00' in a generic way?
  2. How can I easily find the locale associated to '1.234.345,00'?

You can use the amazing Babel library for both.

How can I parse '1.234.345,00' in a generic way?

One locale associated with a . thousands separator and a , decimal separator is ger_de, for German.

To parse it, simply use

>>> from babel.numbers import parse_decimal >>> parse_decimal('1.234.345,00', locale='ger_de') Decimal('1234345.00') 

How can I easily find the locale associated to '1.234.345,00'?

Use this routine which checks the string to parse against the expected value for all locales, and returns the ones that are compatible:

import locale from babel.numbers import parse_decimal from decimal import Decimal  def get_compatible_locales(string_to_parse, expected_decimal):     compatible_aliases = []     for alias in locale.locale_alias:         try:             parsed_decimal = parse_decimal(string_to_parse, locale=alias)             if parsed_decimal == expected_decimal:                 compatible_aliases.append(alias)         except Exception:             continue     return compatible_aliases 

For your example:

>>> print(get_compatible_locales('1.234.345,00', Decimal('1234345'))) ['ar_dz', 'ar_lb', 'ar_ly', 'ar_ma', 'ar_tn', 'ast_es', 'az', 'az_az', 'az_az.iso88599e', 'bs', 'bs_ba', 'ca', 'ca_ad', 'ca_es', 'ca_es@valencia', 'ca_fr', 'ca_it', 'da', 'da_dk', 'de', 'de_at', 'de_be', 'de_de', 'de_lu', 'el', 'el_cy', 'el_gr', 'el_gr@euro', 'en_be', 'en_dk', 'es', 'es_ar', 'es_bo', 'es_cl', 'es_co', 'es_ec', 'es_es', 'es_py', 'es_uy', 'es_ve', 'eu', 'eu_es', 'fo', 'fo_fo', 'fr_lu', 'fy_nl', 'ger_de', 'gl', 'gl_es', 'hr', 'hr_hr', 'hsb_de', 'id', 'id_id', 'in', 'in_id', 'is', 'is_is', 'it', 'it_it', 'kl', 'kl_gl', 'km_kh', 'lb_lu', 'lo', 'lo_la', 'lo_la.cp1133', 'lo_la.ibmcp1133', 'lo_la.mulelao1', 'mk', 'mk_mk', 'nl', 'nl_aw', 'nl_be', 'nl_nl', 'ps_af', 'pt', 'pt_br', 'ro', 'ro_ro', 'rw', 'rw_rw', 'sl', 'sl_si', 'sr', 'sr@cyrillic', 'sr@latn', 'sr_cs', 'sr_cs.iso88592@latn', 'sr_cs@latn', 'sr_me', 'sr_rs', 'sr_rs@latn', 'sr_yu', 'sr_yu.cp1251@cyrillic', 'sr_yu.iso88592', 'sr_yu.iso88595', 'sr_yu.iso88595@cyrillic', 'sr_yu.microsoftcp1251@cyrillic', 'sr_yu.utf8', 'sr_yu.utf8@cyrillic', 'sr_yu@cyrillic', 'tr', 'tr_cy', 'tr_tr', 'vi', 'vi_vn', 'vi_vn.tcvn', 'vi_vn.tcvn5712', 'vi_vn.viscii', 'vi_vn.viscii111', 'wo_sn'] 

Bonus: How can I have a human-readable version of these locales?

Use the following routine, where my_locale should be your own locale:

from babel import Locale  def get_display_name(alias, my_locale='en_US'):     l = Locale.parse(alias)     return l.get_display_name(my_locale) 

You can then use it this way:

>>> print({loc: get_display_name(loc) for loc in locales}) {'ar_dz': 'Arabic (Algeria)', 'ar_lb': 'Arabic (Lebanon)', 'ar_ly': 'Arabic (Libya)', 'ar_ma': 'Arabic (Morocco)', 'ar_tn': 'Arabic (Tunisia)', 'ast_es': 'Asturian (Spain)', 'az': 'Azerbaijani', 'az_az': 'Azerbaijani (Latin, Azerbaijan)', 'az_az.iso88599e': 'Azerbaijani (Latin, Azerbaijan)', 'bs': 'Bosnian', 'bs_ba': 'Bosnian (Latin, Bosnia & Herzegovina)', 'ca': 'Catalan', 'ca_ad': 'Catalan (Andorra)', 'ca_es': 'Catalan (Spain)', 'ca_es@valencia': 'Catalan (Spain)', 'ca_fr': 'Catalan (France)', 'ca_it': 'Catalan (Italy)', 'da': 'Danish', 'da_dk': 'Danish (Denmark)', 'de': 'German', 'de_at': 'German (Austria)', 'de_be': 'German (Belgium)', 'de_de': 'German (Germany)', 'de_lu': 'German (Luxembourg)', 'el': 'Greek', 'el_cy': 'Greek (Cyprus)', 'el_gr': 'Greek (Greece)', 'el_gr@euro': 'Greek (Greece)', 'en_be': 'English (Belgium)', 'en_dk': 'English (Denmark)', 'es': 'Spanish', 'es_ar': 'Spanish (Argentina)', 'es_bo': 'Spanish (Bolivia)', 'es_cl': 'Spanish (Chile)', 'es_co': 'Spanish (Colombia)', 'es_ec': 'Spanish (Ecuador)', 'es_es': 'Spanish (Spain)', 'es_py': 'Spanish (Paraguay)', 'es_uy': 'Spanish (Uruguay)', 'es_ve': 'Spanish (Venezuela)', 'eu': 'Basque', 'eu_es': 'Basque (Spain)', 'fo': 'Faroese', 'fo_fo': 'Faroese (Faroe Islands)', 'fr_lu': 'French (Luxembourg)', 'fy_nl': 'Western Frisian (Netherlands)', 'ger_de': 'German (Germany)', 'gl': 'Galician', 'gl_es': 'Galician (Spain)', 'hr': 'Croatian', 'hr_hr': 'Croatian (Croatia)', 'hsb_de': 'Upper Sorbian (Germany)', 'id': 'Indonesian', 'id_id': 'Indonesian (Indonesia)', 'in': 'Indonesian (Indonesia)', 'in_id': 'Indonesian (Indonesia)', 'is': 'Icelandic', 'is_is': 'Icelandic (Iceland)', 'it': 'Italian', 'it_it': 'Italian (Italy)', 'kl': 'Kalaallisut', 'kl_gl': 'Kalaallisut (Greenland)', 'km_kh': 'Khmer (Cambodia)', 'lb_lu': 'Luxembourgish (Luxembourg)', 'lo': 'Lao', 'lo_la': 'Lao (Laos)', 'lo_la.cp1133': 'Lao (Laos)', 'lo_la.ibmcp1133': 'Lao (Laos)', 'lo_la.mulelao1': 'Lao (Laos)', 'mk': 'Macedonian', 'mk_mk': 'Macedonian (Macedonia)', 'nl': 'Dutch', 'nl_aw': 'Dutch (Aruba)', 'nl_be': 'Dutch (Belgium)', 'nl_nl': 'Dutch (Netherlands)', 'ps_af': 'Pashto (Afghanistan)', 'pt': 'Portuguese', 'pt_br': 'Portuguese (Brazil)', 'ro': 'Romanian', 'ro_ro': 'Romanian (Romania)', 'rw': 'Kinyarwanda', 'rw_rw': 'Kinyarwanda (Rwanda)', 'sl': 'Slovenian', 'sl_si': 'Slovenian (Slovenia)', 'sr': 'Serbian', 'sr@cyrillic': 'Serbian', 'sr@latn': 'Serbian', 'sr_cs': 'Serbian (Cyrillic, Serbia)', 'sr_cs.iso88592@latn': 'Serbian (Cyrillic, Serbia)', 'sr_cs@latn': 'Serbian (Cyrillic, Serbia)', 'sr_me': 'Serbian (Latin, Montenegro)', 'sr_rs': 'Serbian (Cyrillic, Serbia)', 'sr_rs@latn': 'Serbian (Cyrillic, Serbia)', 'sr_yu': 'Serbian (Cyrillic, Serbia)', 'sr_yu.cp1251@cyrillic': 'Serbian (Cyrillic, Serbia)', 'sr_yu.iso88592': 'Serbian (Cyrillic, Serbia)', 'sr_yu.iso88595': 'Serbian (Cyrillic, Serbia)', 'sr_yu.iso88595@cyrillic': 'Serbian (Cyrillic, Serbia)', 'sr_yu.microsoftcp1251@cyrillic': 'Serbian (Cyrillic, Serbia)', 'sr_yu.utf8': 'Serbian (Cyrillic, Serbia)', 'sr_yu.utf8@cyrillic': 'Serbian (Cyrillic, Serbia)', 'sr_yu@cyrillic': 'Serbian (Cyrillic, Serbia)', 'tr': 'Turkish', 'tr_cy': 'Turkish (Cyprus)', 'tr_tr': 'Turkish (Turkey)', 'vi': 'Vietnamese', 'vi_vn': 'Vietnamese (Vietnam)', 'vi_vn.tcvn': 'Vietnamese (Vietnam)', 'vi_vn.tcvn5712': 'Vietnamese (Vietnam)', 'vi_vn.viscii': 'Vietnamese (Vietnam)', 'vi_vn.viscii111': 'Vietnamese (Vietnam)', 'wo_sn': 'Wolof (Senegal)'} 

Try it online!

Read More

Saturday, April 28, 2018

std::locale/std::facet Critical section

Leave a Comment

Out of curiosity. In the past I've seen performance degradation in function like boost::to_lower because of the CriticalSection employed in std::use_facet when the lazy facet is allocated. As far as I remember there was a bug with global lock on locale but according to Stephan Lavavej it was fixed in VS2013. And voila, I saw this lock on facet killing server performance yesterday so I guess I'm confusing two different issues.
But in the first place, why there is a CriticalSection around the lazy facet? Obviously it will ruin the performance. Why they didnt resolve to some kind of upgradable lock or atomic operations on pointers?

1 Answers

Answers 1

MSVC++'s std::locale is implemented in terms of the underlying C function setlocale. That touches global state, and must therefore be protected by a lock.

Changing the locking semantics of a data structure is unfortunately an ABI breaking change, so not much we'll be able to do about it for a while.

Read More

Thursday, March 17, 2016

How to change locale to use Latin Serbian (instead of Cyrillic Serbian)

Leave a Comment

The Serbian language has Latin and Cyrillic alphabets. In Android's Date and Time Picker widgets, the displayed alphabet for Serbian locales seems to be Cyrillic, as seen here.

enter image description here

I wanted to change the locale so that the android widgets are using the Latin Serbian alphabet.

The current language/country code (yielding Cyrillic) are sr and RS respectively. Therefore, my setLocale function is called as

setLocale("sr", "RS"); 

This is the part im not sure about - according to localeplanet.com, the local code for latin serbian is sr_Latn_RS. However, I tried both

setLocale("sr_Latn", "RS"); //and setLocale("sr_Latn_RS", "RS"); 

neither of which work (no change occurs, default to english). According to the Android documentation, it looks like setLocale expects two letter codes.

The language codes are two-letter lowercase ISO language codes (such as "en") as defined by ISO 639-1. The country codes are two-letter uppercase ISO country codes (such as "US") as defined by ISO 3166-1. The variant codes are unspecified.

So how do I specify Cyrllic serbian?

2 Answers

Answers 1

Please search for your query before posting a question. It may be answered in some other related form.

i found these two answers suitable to your query android custom date-picker SO and locale from english to french.

Answers 2

Can you please use below one ?

public class MyApplication extends Application {     @Override     public void onCreate() {         super.onCreate();          Resources res = this.getResources();         Configuration conf = res.getConfiguration();         boolean isLatinAlphabet = PreferenceManager.getDefaultSharedPreferences(this);          if(conf.locale.getLanguage().equals("sr") && isLatinAlphabet) {             conf.locale = new Locale("sr", "YourContryCode");             res.updateConfiguration(conf, res.getDisplayMetrics());         }     } } 

Note: Replace your YourContryCode in conf.locale = new Locale("sr", "YourContryCode"); line.

Manifest.xml:

<application         android:name=".MyApplication"         android:icon="@drawable/ic_launcher"         android:label="@string/application_name"         android:theme="@style/AppTheme">     ...  </application> 

Hope this will help you.

Read More