[Patch] enable unicode for filter regexps

Development-related discussion, including bundled plugins
Gravemind
Bear Rating Trainee
Bear Rating Trainee
Posts: 1
Joined: 01 Oct 2016, 16:00

[Patch] enable unicode for filter regexps

Postby Gravemind » 01 Oct 2016, 16:34

I had issues with feeds not matched by some of my filter regexp, I look into it and found that at least the rule "\b" needs unicode enabled to work properly on some feeds (probably unicode encoded feeds ? I don't know).

For example, the filter regexp:

Code: Select all

\bfoo\b

wouldn't match at all some feeds. Now, with pgrep_match unicode enabled, it works as expected.

I don't know if there is any other (bad?) consequences to enabling unicode, but I have the patch running for several months now, and all my filter regexps (with and without "\b") seem to work properly.

Here is the patch:

Code: Select all

diff --git a/include/rssfuncs.php b/include/rssfuncs.php
index 32bc69819b..ccc6d51545 100644
--- a/include/rssfuncs.php
+++ b/include/rssfuncs.php
@@ -1382,29 +1382,29 @@
 
             switch ($rule["type"]) {
             case "title":
-               $match = @preg_match("/$reg_exp/i", $title);
+               $match = @preg_match("/$reg_exp/iu", $title);
                break;
             case "content":
                // we don't need to deal with multiline regexps
                $content = preg_replace("/[\r\n\t]/", "", $content);
 
-               $match = @preg_match("/$reg_exp/i", $content);
+               $match = @preg_match("/$reg_exp/iu", $content);
                break;
             case "both":
                // we don't need to deal with multiline regexps
                $content = preg_replace("/[\r\n\t]/", "", $content);
 
-               $match = (@preg_match("/$reg_exp/i", $title) || @preg_match("/$reg_exp/i", $content));
+               $match = (@preg_match("/$reg_exp/iu", $title) || @preg_match("/$reg_exp/iu", $content));
                break;
             case "link":
-               $match = @preg_match("/$reg_exp/i", $link);
+               $match = @preg_match("/$reg_exp/iu", $link);
                break;
             case "author":
-               $match = @preg_match("/$reg_exp/i", $author);
+               $match = @preg_match("/$reg_exp/iu", $author);
                break;
             case "tag":
                foreach ($tags as $tag) {
-                  if (@preg_match("/$reg_exp/i", $tag)) {
+                  if (@preg_match("/$reg_exp/iu", $tag)) {
                      $match = true;
                      break;
                   }

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: [Patch] enable unicode for filter regexps

Postby fox » 03 Oct 2016, 18:46

i don't think this should break anything (although i can be wrong), it would be cool if you refiled this as a gitlab merge request.


Return to “Development”

Who is online

Users browsing this forum: No registered users and 1 guest