![]() ![]() ![]() All groups are marked as non-capturing groups by using ?: immediately after the first parenthesis.$ to ensure the whole string is checked/matched, not just part of it. The regular expression is wrapped in ^.The first exclusion group uses *? to make it non-greedy so it doesn't consume the whole text and stops as soon as the searched for word is found.Exclusion groups are implemented using a method that is explained very well in this answer.None of the words in the exclude list may appear before or after the searched for word, hence need an "exclusion group" either side of it.For the words you want to include it only matters that at least one of the appears at least once - so that is all that is being searched for in.include "word" but not "sword" or "words") so are wrapped in \b either side. Note: The above assumes QRegExp supports variable-length lookahead - I haven't verified this. See Debuggex Demo (with matching and non-matching examples). You may want to replace it with if you need the same functionality, but I think it is not necessary. At the demo Web site, the dot does not match newline symbols. in the pattern matches any character including a newline. At the demo Web site, single backslashes must be used, I am doubling them here for the QRegExp.Note the use of non-capturing groups, it does not store any captured texts and potentially improves performance as compared to capturing groups that save the matched text in a buffer. To add more alternatives, we can use an alternation operator |, and we can do it like this: ^(?!.*\\b(?:words|to|exclude)\\b).*\\b(?:words|to|include)\\b. Now, it does not match the string in question. To prevent the match in case the string contains the word ipsum, you need to use the lookahead at the very beginning of the string. The regex is \\blorem\\b (with QRegExp.CaseInsensitive set to 1) where \b is used to force whole word matching. You say, you have Lorem ipsum dolor sit amet, consectetur adipiscing elit., and you want it to match since it contains the word lorem. ![]() Use excluded words as alternatives inside an anchored negative look-ahead. I think there is no need in a tempered greedy quantifier. I considered doing a regex to filter out the articles with the words that I want and then run a second regex excluding articles from the first set that I do not want, but unfortunately the software I am using does not allow me to do this. But if ipsum is a word that I'm excluding, I do not want article A to be filtered. Then a regex that includes lorem would filter article A but not B. So for example if article A is: Lorem ipsum dolor sit amet, consectetur adipiscing elit.Īnd article B is: Vivamus fermentum semper porta. So I tried this (which returned nothing): ^(words|I|want|to|include)(?:(?!the|ones|that|should|not|match).)*$Įdit: The reason why I need such an unusual regex (include/exclude) is because I want to search through a series of articles and filter the ones that have the included words in them but not if they also have the excluded words in them. Here are some of the things I tried (this example included all of the words): (words|I|want|to|include)(?!the|ones|that|should|not|match) I am trying to exclude a group of words but include another group of words in a qregexp expression but I am currently having issues figuring this out.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |