regex - keyword highlight is highlighting the highlights in PHP preg_replace() -
regex - keyword highlight is highlighting the highlights in PHP preg_replace() -
i have little search engine doing thing, , want highlight results. thought had worked out till set of keywords used today blew out of water.
the issue preg_replace() looping through replacements, , later replacements replacing text inserted previous ones. confused? here pseudo function:
public function highlightkeywords ($data, $keywords = array()) { $find = array(); $replace = array(); $begin = "<span class=\"keywordhighlight\">"; $end = "</span>"; foreach ($keywords $kw) { $find[] = '/' . str_replace("/", "\/", $kw) . '/iu'; $replace[] = $begin . "\$0" . $end; } homecoming preg_replace($find, $replace, $data); }
ok, works when searching "fred" , "dagg" sadly, when searching "class" , "lass" , "as" strikes real issue when highlighting "joseph's class group"
joseph's <span class="keywordhighlight">cl</span><span <span c<span <span class="keywordhighlight">cl</span>ass="keywordhighlight">lass</span>="keywordhighlight">c<span <span class="keywordhighlight">cl</span>ass="keywordhighlight">lass</span></span>="keywordhighlight">ass</span> grouping
how latter replacements work on non-html components, allow tagging of whole match? e.g. if searching "cla" , "lass" want "class" highlighted in total both search terms in it, though overlap, , highlighting applied first match has "class" in it, that shouldn't highlighted.
sigh.
i rather utilize php solution jquery (or client-side) one.
note: have tried sort keywords length, doing long ones first, means cross-over searches not highlight, meaning "cla" , "lass" part of word "class" highlight, , still murdered replacement tags :(
edit: have messed about, starting pencil & paper, , wild ramblings, , come unglamorous code solve issue. it's not great, suggestions trim/speed still appreciated :)
public function highlightkeywords ($data, $keywords = array()) { $find = array(); $replace = array(); $begin = "<span class=\"keywordhighlight\">"; $end = "</span>"; $hits = array(); foreach ($keywords $kw) { $offset = 0; while (($pos = stripos($data, $kw, $offset)) !== false) { $hits[] = array($pos, $pos + strlen($kw)); $offset = $pos + 1; } } if ($hits) { usort($hits, function($a, $b) { if ($a[0] == $b[0]) { homecoming 0; } homecoming ($a[0] < $b[0]) ? -1 : 1; }); $thisthat = array(0 => $begin, 1 => $end); ($i = 0; $i < count($hits); $i++) { foreach ($thisthat $key => $val) { $pos = $hits[$i][$key]; $data = substr($data, 0, $pos) . $val . substr($data, $pos); ($j = 0; $j < count($hits); $j++) { if ($hits[$j][0] >= $pos) { $hits[$j][0] += strlen($val); } if ($hits[$j][1] >= $pos) { $hits[$j][1] += strlen($val); } } } } } homecoming $data; }
i've used next address problem:
<?php $protected_matches = array(); function protect(&$matches) { global $protected_matches; homecoming "\0" . array_push($protected_matches, $matches[0]) . "\0"; } function restore(&$matches) { global $protected_matches; homecoming '<span class="keywordhighlight">' . $protected_matches[$matches[1] - 1] . '</span>'; } preg_replace_callback('/\x0(\d+)\x0/', 'restore', preg_replace_callback($patterns, 'protect', $target_string));
the first preg_replace_callback
pulls out matches , replaces them nul-byte-wrapped placeholders; sec pass replaces them span tags.
edit: forgot mention $patterns
sorted string length, longest shortest.
edit; solution
<?php function highlightkeywords($data, $keywords = array(), $prefix = '<span class="hilite">', $suffix = '</span>') { $datacopy = strtolower($data); $keywords = array_map('strtolower', $keywords); $start = array(); $end = array(); foreach ($keywords $keyword) { $offset = 0; $length = strlen($keyword); while (($pos = strpos($datacopy, $keyword, $offset)) !== false) { $start[] = $pos; $end[] = $offset = $pos + $length; } } if (!count($start)) homecoming $data; sort($start); sort($end); // merge , sort start/end using negative values identify endpoints $zipper = array(); $i = 0; $n = count($end); while ($i < $n) $zipper[] = count($start) && $start[0] <= $end[$i] ? array_shift($start) : -$end[$i++]; // example: // [ 9, 10, -14, -14, 81, 82, 86, -86, -86, -90, 99, -103 ] // take 9, discard 10, take -14, take -14, create pair, // take 81, discard 82, discard 86, take -86, take -86, take -90, create pair // take 99, take -103, create pair // result: [9,14], [81,90], [99,103] // generate non-overlapping start/end pairs $a = array_shift($zipper); $z = $x = null; while ($x = array_shift($zipper)) { if ($x < 0) $z = $x; else if ($z) { $spans[] = array($a, -$z); $a = $x; $z = null; } } $spans[] = array($a, -$z); // insert prefix/suffix in start/end locations $n = count($spans); while ($n--) $data = substr($data, 0, $spans[$n][0]) . $prefix . substr($data, $spans[$n][0], $spans[$n][1] - $spans[$n][0]) . $suffix . substr($data, $spans[$n][1]); homecoming $data; }
php regex preg-replace highlight keyword
Comments
Post a Comment