Sunday, October 29, 2017

Uppercase surname, excluding the lowercase prefix section of a surname

Leave a Comment

I am trying to determine a method to uppercase a surname; however, excluding the lowercase prefix.

Example of names and their conversion:

  • MacArthur -> MacARTHUR
  • McDavid -> McDAVID
  • LeBlanc -> LeBLANC
  • McIntyre -> McINTYRE
  • de Wit -> de WIT

There are also names that would contain the surnames that would need to be fully capitalized, so a simple function to identify the prefix such as strchr()would not suffice:

  • Macmaster -> MACMASTER
  • Macintosh -> MACINTOSH

The PHP function mb_strtoupper() is not appropriate, as it capitalizes the complete string. Similarly strtoupper() is not appropriate, and loses accents on accented names as well.

There are some answers around SO that partly answer the question, such as : Capitalization using PHP However, the common shortfall is assuming that all names with a surname as as Mac are followed with a capital.

The names are capitalized properly in the database, so we can assume that a name spelled as Macarthur is correct, or MacArthur is correct for another person.

7 Answers

Answers 1

Going with the rule to capitalise everything after the last capital letter:

preg_replace_callback('/\p{Lu}\p{Ll}+$/u',                        function ($m) { return mb_strtoupper($m[0]); },                       $name) 

\p{Lu} and \p{Ll} are Unicode upper and lower case characters respectively, and mb_strtoupper is unicode aware… for a simple ASCII-only variant this would do too:

preg_replace_callback('/[A-Z][a-z]+$/',                        function ($m) { return strtoupper($m[0]); },                       $name) 

Answers 2

I believe this is the solution to question:

$names = array(     'MacArthur',     'Macarthur',     'ÜtaTest',     'de Wit' );  $pattern = '~(?<prefix>(?:\p{Lu}.+|.+\s+))(?<suffix>\p{Lu}.*)~'; foreach ($names as $key => $name) {     if (preg_match($pattern, $name, $matches)) {         $names[$key] = $matches['prefix'] . mb_strtoupper($matches['suffix']);     } else {         $names[$key] = mb_strtoupper($name);     } }  print_r($names); 

it produces following result for the input array above:

Array (     [0] => MacARTHUR     [1] => MACARTHUR     [2] => ÜtaTEST     [3] => de WIT ) 

Brief explanation of regular expression:

(?<prefix>             # name of the captured group    (?:                 # ignore this group        \p{Lu}.+        # any uppercase character followed by any character        |               # OR        .+\s+           # any character followed by white space    ) ) (?<suffix>             # name of the captured group     \p{Lu}.*           # any uppercase character followed by any character ) 

Answers 3

Here's a basic algorithm that avoids cryptic regular expressions:

  1. Create a multibyte-safe character array for the literal surname (as it exists in the database).
  2. Create a second character array in multibyte-safe capitalized form.
  3. Intersect both arrays to determine the index of the final capitalized character.
  4. Concatenate the literal surname through the index with the capitalized form after the index.

In code form:

<?php $names = [     'MacArthur',     'McDavid',     'LeBlanc',     'McIntyre',     'de Wit',     'Macmaster',     'Macintosh',     'MacMac',     'die Über',     'Van der Beek',     'johnson',     'Lindström',     'Cehlárik', ];  // Uppercase after the last capital letter function normalizeSurname($name) {     // Split surname into a Unicode character array     $chars = preg_split('//u', $name, -1, PREG_SPLIT_NO_EMPTY);      // Capitalize surname and split into a character array     $name_upper = mb_convert_case($name, MB_CASE_UPPER);     $chars_upper = preg_split('//u', $name_upper, -1, PREG_SPLIT_NO_EMPTY);      // Find the index of the last capitalize letter     @$last_capital_idx = array_slice(array_keys(array_intersect($chars, $chars_upper)), -1)[0] ?: 0;      // Concatenate the literal surname up to the index, and capitalized surname thereafter     return mb_substr($name, 0, $last_capital_idx) . mb_substr($name_upper, $last_capital_idx); }  // Loop through the surnames and display in normalized form foreach($names as $name) {     echo sprintf("%s -> %s\n",          $name,         normalizeSurname($name)     ); } 

You'll get output like:

MacArthur -> MacARTHUR McDavid -> McDAVID LeBlanc -> LeBLANC McIntyre -> McINTYRE de Wit -> de WIT Macmaster -> MACMASTER Macintosh -> MACINTOSH MacMac -> MacMAC die Über -> die ÜBER Van der Beek -> Van der BEEK johnson -> JOHNSON Lindström -> LINDSTRÖM Cehlárik -> CEHLÁRIK 

This makes the assumption that an entirely lowercase surname should be capitalized. It would be easy to change that behavior.

Answers 4

  $string = "McBain";   preg_match('/([A-Z][a-z]+\h*)$/', $string, $matches);   /**     Added qualifier for if no match found    **/   if(!empty($matches[1])){       // $upperString = str_replace($matches[1], strtoupper($matches[1]),$string);       // replace only last occurance of string:       $pos = strrpos($string, $matches[1]);      if($pos !== false)          {          $upperString = substr_replace($string, strtoupper($matches[1]), $pos, strlen($matches[1]));           }   }   else {       $upperString = strtoupper($string);   }   print $upperString; 

Example Output:

$string = "McBain "; $upperString = "McBAIN";  $string = "Mac Hartin"; $upperString = "Mac HARTIN";  $string = "Macaroni "; $upperString = "MACARONI";  $string = "jacaroni"; $upperString = "JACARONI";  $string = "MacMac"; $upperString = "MacMAC"; 

( Also added a \h* to the regex to catch any whitespace. )

reference for find/replace last occurance.

Answers 5

<?php $string = "MacArthur"; $count = 0; $finished = ""; $chars = str_split($string); foreach($chars as $char){     if(ctype_upper($char)){         $count++;     }         if($count == 2){           $finished .= strtoupper($char);          }          else{           $finished .= $char;               }  }  echo $finished;  

Answers 6

Here is the code to uppercase all symbols after a last uppercase in the string.

preg_replace_callback('/[A-Z][^A-Z]+$/', function($match) {   return strtoupper($match[0]); }, $str); 

Try it with test examples from your question: https://repl.it/NYcR/5

Answers 7

Just to differ from the rest of the answers you could try something like this.

$names = array(     'MacArthur',     'Macarthur',     'ÜtaTest',     'de Wit' ); function fixSurnameA($item) { $lname = mb_strtolower($item); $nameArrayA = str_split($item,1); $nameArrayB = str_split($lname,1); $result = array_diff($nameArrayA, $nameArrayB); $keys = array_keys($result); $key = max($keys); if(count($keys)>=2 or (count($keys)==1 and $key>0)) { $pre = substr($item, 0, $key); $suf = mb_strtoupper(substr($item, $key)); echo $pre.$suf."\n"; } else {  echo $item."\n"; } } function fixSurnameB($item) { $lname = mb_strtolower($item); $nameArrayA = str_split($item,1); $nameArrayB = str_split($lname,1); $result = array_diff($nameArrayA, $nameArrayB); $keys = array_keys($result); $key = max($keys); $pre = substr($item, 0, $key); $suf = mb_strtoupper(substr($item, $key)); echo $pre.$suf."\n"; }  array_walk($names,'fixSurnameA'); /* MacARTHUR    Macarthur    ÜtaTEST    de WIT  */ array_walk($names,'fixSurnameB'); /* MacARTHUR    MACARTHUR    ÜtaTEST    de WIT  */ 

Test this on PHP SandBox

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment