Fast string compare function for PHP
Hello User,
As a developer, I was faced with a challenge: I needed a fast but rough string comparison function for our website. It checks the approximate percentage change between old and new text.
I tried a few PHP functions like similar_text() and later levenshtein(). But even these native functions took over 30 seconds for long texts - far too long for our website.
My search for a more efficient solution led me to a simple but highly effective method. Instead of complex algorithms, I now use basic word processing techniques such as length comparison and word counting. The result? An 80% to 90% increase in performance over similar_text() and levenshtein().
My humble stringCompare() function compares two texts ($str1, $str2) and returns the difference as a percentage. It is not only fast, but also easy to understand and implement. However, it is not 100% accurate (depending on the length of the text, single digit differences are possible). It focuses more on quantitative aspects (length, word frequency) than on qualitative aspects (meaning, context). It is effective at identifying structural similarities and literal matches, but may miss more subtle differences or similarities in content.
I would like to share this solution with the developer community in the hope that it may help with similar performance issues.
Example:
Result: Percentage change (same position, replaced by x) : 93.59%
As a developer, I was faced with a challenge: I needed a fast but rough string comparison function for our website. It checks the approximate percentage change between old and new text.
I tried a few PHP functions like similar_text() and later levenshtein(). But even these native functions took over 30 seconds for long texts - far too long for our website.
My search for a more efficient solution led me to a simple but highly effective method. Instead of complex algorithms, I now use basic word processing techniques such as length comparison and word counting. The result? An 80% to 90% increase in performance over similar_text() and levenshtein().
My humble stringCompare() function compares two texts ($str1, $str2) and returns the difference as a percentage. It is not only fast, but also easy to understand and implement. However, it is not 100% accurate (depending on the length of the text, single digit differences are possible). It focuses more on quantitative aspects (length, word frequency) than on qualitative aspects (meaning, context). It is effective at identifying structural similarities and literal matches, but may miss more subtle differences or similarities in content.
I would like to share this solution with the developer community in the hope that it may help with similar performance issues.
public static function stringCompare($str1, $str2): float {
if ($str1 === $str2) return 0.0;
$str1 = self::cleanString($str1);
$str2 = self::cleanString($str2);
if ($str1 === '' && $str2 === '') return 0.0;
if ($str1 === '' || $str2 === '') return 100.0;
$len1 = strlen($str1);
$len2 = strlen($str2);
// length change
$lenChange = abs($len1 - $len2) / max($len1, $len2);
// Improved detection of repeated text
$repetitionFactor = 0;
if (str_contains($str2, $str1) || str_contains($str1, $str2)) {
$repetitions = max($len1, $len2) / min($len1, $len2);
$repetitionFactor = 1 - (1 / $repetitions);
}
// character comparison
$chars1 = count_chars($str1, 1);
$chars2 = count_chars($str2, 1);
$charDiff = 0;
foreach (array_keys($chars1 + $chars2) as $i) {
$charDiff += abs(($chars1[$i] ?? 0) / $len1 - ($chars2[$i] ?? 0) / $len2);
}
$charChange = $charDiff / 2; // normalisation
// comparison of words
$words1 = str_word_count(mb_strtolower($str1), 1);
$words2 = str_word_count(mb_strtolower($str2), 1);
$wordCount1 = array_count_values($words1);
$wordCount2 = array_count_values($words2);
$wordDiff = 0;
$allWords = array_unique(array_merge(array_keys($wordCount1), array_keys($wordCount2)));
$count1 = count($words1);
$count2 = count($words2);
if ($count1 > 0 || $count2 > 0) {
foreach ($allWords as $word) {
$freq1 = $count1 > 0 ? ($wordCount1[$word] ?? 0) / $count1 : 0;
$freq2 = $count2 > 0 ? ($wordCount2[$word] ?? 0) / $count2 : 0;
$wordDiff += abs($freq1 - $freq2);
}
$wordChange = $wordDiff / 2; // normalisation
} else {
// If no words are recognised, we only use the character comparison.
$wordChange = $charChange;
}
// Weighted total change
$overallChange = max(
$lenChange,
$repetitionFactor,
($charChange * 0.4 + $wordChange * 0.6)
);
return round(min($overallChange, 1.0) * 100, 2);
}
Example:
$str1 = "This is an example text for the comparison";
$str2 = "xxxx xx xx xxxxxxx xxxx xxx xxx xxxxxxxxxx";
echo "Percentage change (same position, replaced by x): " . TextComparison::stringCompare($str1, $str2) . "%\n";
Result: Percentage change (same position, replaced by x) : 93.59%
Please also mark the comments that contributed to the solution of the article
Content-ID: 671361
Url: https://rootdb.com/tutorial/fast-string-compare-function-for-php-671361.html
Printed on: February 22, 2025 at 16:02 o'clock