Disclaimer
This is an In Progress post. It is incomplete and of poorer quality than my other posts. It’s an experiment in encouraging me to publish more often.
Picture this. You’re searching a large amount of code for some kind of statement/expression that has a pretty non-specific form. Obviously, the best option would be to use some kind of matching tool that knows about the grammar/syntax of the programming language that the code is written in.
But we aren’t always lucky enough to have such a tool at hand, or maybe we’re searching through many code bases each with their own idiosyncratic build systems.
So we fall back on regular
expressions and our favourite unix tools e.g. grep
or
rg
/ripgrep
.
Now it is well known that, in general, that programming languages can’t be parsed using regular expressions (and if you do you’re liable to summon unholy Eldrich beings). So you can’t hope to match every case of some general syntactic forms. But what if you don’t care about 100% accuracy and you just want to do a “good enough” job?
Then by all means use a regular expression!
The problem is that, if the code base is big enough, you’re just going to miss a lot. But you want to get the percentage of what you miss down to less than, say, 1%.
If you follow this process you can craft a convoluted (but pretty good!) regex in short order.
regexr
Recently ƎDOↃ Security shared
with me an online tool that allows you to the same as what I suggested
above with a text file: regexr. You can
collect examples of what you want to match in the textbox up the top and
then iteratively craft your regex until you have what you want. Be aware
there are subtle differences between some of the regex languages, so
something you craft in regexr
might not have the same
result in grep
or rg
. Then again, in most
cases, it will.