Question: | The "Tenon SPAM Filter" uses some complex regular expressions, can you explain things in a little more detail?
| Answer: | First, take a look at the Tenon spam filter as it looks in a Post.Office form.
Now, let's take the first filter:
Subject: *\s{10,}
The expression:
\s means any white space
\s{M, N} means at least M number of white spaces and not more than N or any number of white spaces between M and N
Therefore:
\s{10,} means at least 10 white spaces
\s{10} means exactly 10 white spaces
Here's some more useful notation:
^ means the beginning of a line (beginning with)
$ means the end of a line (ending with)
And, of course the asterisk '*' is a wildcard, so
*enis* means any word with the contiguous letters 'enis' (which SPAMmers are starting to use now).
And | means OR, so *enis|viagra* would filter anything with either of those character strings.
Because regular expressions use certain characters in a special way, it is important to escape those characters if you want to filter for them literally. These characters include:
*[]^$+-?|(){}
To escape a character, simply put a backslash ("\") before it.
| last updated 10.03.2003 | |