I would start by reading up on regex in PHP. For example:
$floorarray = preg_split("/\sfloor\s/i", $floorstring)
Other useful functions are preg_grep
, preg_match
, etc
Edit: added a more complete solution.
This solution takes as an input a string describing the floor. It can be of various formats such as:
- Floor 102
- Floor One-hundred two
- Floor One hundred and two
- One-hundred second floor
- 102nd floor
- 102ND FLOOR
- etc
Until I can look at an example input file, I am just guessing from your post that this will be adequate.
<?php
$errorLog = 'error-log.txt'; // a file to catalog bad entries with bad floors
// These are a few example inputs
$addressArray = array('Fifty-second Floor', 'somefloor', '54th floor', '52qd floor',
'forty forty second floor', 'five nineteen hundredth floor', 'floor fifty-sixth second ninth');
foreach ($addressArray as $id => $address) {
$floor = parseFloor($id, $address);
if ( empty($floor) ) {
error_log('Entry '.$id.' is invalid: '.$address."\n", 3, $errorLog);
} else {
echo 'Entry '.$id.' is on floor '.$floor."\n";
}
}
function parseFloor($id, $address)
{
$floorString = implode(preg_split('/(^|\s)floor($|\s)/i', $address));
if ( preg_match('/(^|^\s)(\d+)(st|nd|rd|th)*($|\s$)/i', $floorString, $matchArray) ) {
// floorString contained a valid numerical floor
$floor = $matchArray[2];
} elseif ( ($floor = word2num($floorString)) != FALSE ) { // note assignment op not comparison
// floorString contained a valid english ordinal for a floor
; // No need to do anything
} else {
// floorString did not contain a properly formed floor
$floor = FALSE;
}
return $floor;
}
function word2num( $inputString )
{
$cards = array('zero',
'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'ten',
'eleven', 'twelve', 'thirteen', 'fourteen', 'fifteen', 'sixteen', 'seventeen', 'eighteen', 'nineteen', 'twenty');
$cards[30] = 'thirty'; $cards[40] = 'forty'; $cards[50] = 'fifty'; $cards[60] = 'sixty';
$cards[70] = 'seventy'; $cards[80] = 'eighty'; $cards[90] = 'ninety'; $cards[100] = 'hundred';
$ords = array('zeroth',
'first', 'second', 'third', 'fourth', 'fifth', 'sixth', 'seventh', 'eighth', 'ninth', 'tenth',
'eleventh', 'twelfth', 'thirteenth', 'fourteenth', 'fifteenth', 'sixteenth', 'seventeenth', 'eighteenth', 'nineteenth', 'twentieth');
$ords[30] = 'thirtieth'; $ords[40] = 'fortieth'; $ords[50] = 'fiftieth'; $ords[60] = 'sixtieth';
$ords[70] = 'seventieth'; $ords[80] = 'eightieth'; $ords[90] = 'ninetieth'; $ords[100] = 'hundredth';
// break the string at any whitespace, dash, comma, or the word 'and'
$words = preg_split( '/([\s-,](?!and\s)|\sand\s)/i', $inputString );
$sum = 0;
foreach ($words as $word) {
$word = strtolower($word);
$value = array_search($word, $ords); // try the ordinal words
if (!$value) { $value = array_search($word, $cards); } // try the cardinal words
if (!$value) {
// if temp is still false, it's not a known number word, fail and exit
return FALSE;
}
if ($value == 100) { $sum *= 100; }
else { $sum += $value; }
}
return $sum;
}
?>
In the general case, parsing words into numbers is not easy. The best thread that I could find that discusses this is here. It is not nearly as easy as the inverse problem of converting numbers into words. My solution only works for numbers <2000, and it liberally interprets poorly formed constructs rather than tossing an error. Also, it is not resilient against spelling mistakes at all. For example:
- forty forty second = 82
- five nineteen hundredth = 2400
- fifty-sixth second ninth = 67
If you have a lot of inputs and most of them are well formed, throwing errors for spelling mistakes is not really a big deal because you can manually correct the short list of problem entries. Silently accepting bad input, however, could be a real problem depending on your application. Just something to think about when deciding if it is worth it to make the conversion code more robust.