|
Search tenon.com
Thanks to:
|
|
Post.Office
grep spam subject filter on 6 consonants in a row strings
you've seen those subjects with random strings of letters...
like this one, that I just caught:
Subject: "Men, Never Be Embarassed Again agxjzphyk kmof"
here is a grep filter that you can put in the subject field to trap spams
that use long subject strings comprised of nonsense.
While it won't find nonsense directly (would that it could), this grep
statement finds strings of six or more consonants in a row.
here are the words in the /usr/share/dict/words file that it found:
[shell:~] dan% cat /usr/share/dict/words | grep
'.*[bcdfghjklmnpqrstvwxz][bcdfghjklmnpqrstvwxz][bcdfghjklmnpqrstvwxz][bcdfgh
jklmnpqrstvwxz][bcdfghjklmnpqrstvwxz][bcdfghjklmnpqrstvwxz].*'
archchronicler
bergschrund
Eschscholtzia
fruchtschiefer
latchstring
lengthsman
Nachschlag
postphthisic
veldtschoen
there are other words with six consonants in a row not in the dict file:
weltschmerz
hirschsprung's disease
catchphrase
hertzsprung-russell diagram
knightsbridge
borschts
hantzschia
If you are likely to see any of these words in your valid message subjects,
don't use this filter or put the words in a not statement. For example, if
you are German and use a lot of German words in your subjects, this filter
will probably stop many of them!
However, these are good words to cut and paste into your test messages if
you decide to test this filter. My filter caught the four that I tested.
Note that the syntax for PO grep is different than for Mac OSX terminal
(tsch) grep. here is the syntax you need for PO grep (no dots at the
beginning or end):
*[bcdfghjklmnpqrstvwxz][bcdfghjklmnpqrstvwxz][bcdfghjklmnpqrstvwxz][bcdfghjk
lmnpqrstvwxz][bcdfghjklmnpqrstvwxz][bcdfghjklmnpqrstvwxz]*
If you try this syntax with tsch grep you will not find any matches.
A while ago I asked if there was any documentation on grep in PO. Anita from
tenon said that it was 'the same as OSX' but I am finding that the syntax is
different. Any documentation on PO grep would help devising better spam
filters. I find it hard to test in tsch and then try to apply the grep
patterns in PO and see them break. Any help on this front appreciated.
Eric, do you have more information on PO's grep?
For instance, one can do this more concisely with:
egrep '[bcdfghjklmnpqrstvwxz]{6}' but PO doesn't seem to know what to do
with certain egrep constructs and (as noted above) it doesn't deal with dots
and stars in the same way.
Caveat, use at your own risk, and note that I haven't applied this to the
email body field, only the subject field.
dan
---------
Tenon Intersystems' Post.Office Mailing List
To unsubscribe: send mailto:post_office-request@xxxxxxxxxxxxxxx
with the body only containing:
unsubscribe
Find the searchable mailing list archives at:
http://postoffice.computeroil.com/
|
| Tenon Home |
Products |
Order |
Contact Us |
About Tenon |
Register |
Tech Support |
Resources |
Press Room |
Mailing Lists |
|
Copyright©2003 Tenon Intersystems, 232 Anacapa Street, Suite 2A, Santa Barbara,
CA 93101. All rights reserved.
Questions about our website - Contact:
webmaster@tenon.com.
|
|