|
Search tenon.com
Thanks to:
|
|
Post.Office
Re: Bayesian filtering is driving me nuts.
Wow; that's a lot of work. Since Tenon has made such technogeek stuff easy and usable for the "rest of us" over the years, maybe they will devise a simple way to do this.
Something must give. Recently SPAM is out-of-control and running SA out-of-the-box, ClamAv and (3) RBL's still allow far too much SPAM to pass through to our users -- I have received 600+ per day that get by the above filters!
I tried to go back to my refined subject and body_text filters (w/100 body_text phrases), they worked, they trapped more SPAM, but still a bunch of "misspelled" and spoofed SPAM get through and the false positive searching takes about as long as deleting excessive SPAM -- and I'm so-over looking for false positives.
Tenon -- help.
</vent>
Elton
On Jun 25, 2004, at 11:03 AM, Dan Tappin wrote:
You bet.
I have a bit of a custom / complicated system created.
I have all mail caught by SA rejected and passed to a holding account - a regular pop3 account which I do not check. I have Joe Savelberg's fabulous SA script which I have hacked to process each message caught and re-write the headers to include the SA score and reports. The script then moves the messages to an IMAP account that I use to manually review the messages.
I have Outlook rules that assign the messages a category based on the spam score 5,6,7,8,9,10+,15+,20+, etc which I then use to group preview the messages. If the score is over 25 I don't normally even look at the messages I just delete them.
I then have a ham and spam folder in the IMAP review account. I can drag an drop spam that gets by the filter and the same for ham that gets caught. I then have a cron script that runs every 5 minutes and clears out these folders via sa-learn. I also keep a copy of the ham and spam I feed to SA so that if I ever need to kill the bayes database (coruption) I do not need to start from scratch.
The SA plug-in need to be changed so that the headers are changed by SA directly on the front end and you need to be able to set a threshold to allow, review or reject a message based on the spam score (.i.e. less than 10 allow, less than 25 forward to another address, more than 25 discard).
Dan
-----Original Message-----
From: John Sievert [mailto:john@xxxxxxxxxxxxxxx]
Sent: Friday, June 25, 2004 9:46 AM
To: post_office@xxxxxxxxxxxxxxx
Subject: Re: Bayesian filtering is driving me nuts.
So, you do the training of the filters manually - by collecting a quantity of each in a mbox and processing it manually through the sa_learn utility?
J
--
“Insanity is doing the same thing over and over again and expecting different results.” - Albert Einstein
John Sievert
Customer 1st, Inc.
2950 Metro Drive, #101
Mpls, MN 55425
(952) 851-7901
(952) 851-7907 fax
This email and any attachments are confidential and may be privileged. This email is intended solely for the use of the intended recipient. If you are not the intended recipient, any use, disclosure or copying of this email and any attachments is strictly prohibited. If you receive this email in error, please notify the sender immediately by reply email and destroy the message and its attachments.
On Jun 25, 2004, at 8:43 AM, Dan Tappin wrote:
I picked 1 for ham because I didn't get many and bayes won't kick-in until you have the minimum number of spam AND ham. I disabled auto learn because I had no control over the messages being added. I find the odd high scoring ham and didn't want these auto learned as spam.
Dan
-----Original Message-----
From: John Sievert [mailto:john@xxxxxxxxxxxxxxx]
Sent: Thursday, June 24, 2004 10:41 PM
To: post_office@xxxxxxxxxxxxxxx
Subject: Re: Bayesian filtering is driving me nuts.
Thanks Dan, I'll give this a try. Some stuff in here I didn't think of.
Question:
The min ham and spam numbers. Why did you pick these? Also, why not auto learn?
I'm hopeful this will really help with the spam detection. I use SpamSieve on my laptop and it has been hugely successful using the same technology.
J
"You can only be young once. But you can always be immature." - Dave Barry
John Sievert
Customer 1st, Inc.
2950 Metro Drive, Suite 101
Mpls, MN 55425
952.851.7901 office
952.851.7907 fax
On Jun 24, 2004, at 4:47 PM, Dan Tappin wrote:
It's ugly I agree.
Here is my SA config:
make a directory '.spamassassin' in /var/spool/post.office
drwxr-x--- 5 mta mail 170 Jun 24 15:02 .spamassassin
Edit your /etc/mail/spamassassin/local.cf:
## Misc
use_bayes 1
bayes_file_mode 0777
bayes_path /var/spool/post.office/.spamassassin/bayes
bayes_auto_expire 1
bayes_learn_to_journal 0
bayes_auto_learn 0
bayes_min_ham_num 1
bayes_min_spam_num 200
Ensure that your SA start-up script uses the '-u mta' option which runs SA as the same user as PO.
This set-up was the only way I have ever made it work properly. SA seems to pick a different location each time for the bayes
location unless you force / trick it.
Dan
-----Original Message-----
From: John Sievert [mailto:john@xxxxxxxxxxxxxxx]
Sent: Thursday, June 24, 2004 2:23 PM
To: post_office@xxxxxxxxxxxxxxx
Subject: Bayesian filtering is driving me nuts.
I think I can improve my filtering big time with the Bayesian filtering for spamassassin. but i can't make it work. Here is what i
get from the log
Jun 24 14:48:38 mail spamd[2587]: debug: auto-learning failed: lock: 2587 cannot create tmp lockfile
/private/var/root/.spamassassin/.lock.mail.customer1st.com.2587 for /private/var/root/.spamassassin/.lock: Permission denied
I have run spamd as root. i have changed the permissions on the dir so that it is rw everywhere but still the same problem.
any suggestions?
J
--
Give a man a fish and you feed him for a day; teach him to use the Net and he won't bother you for weeks.
John Sievert
Customer 1st, Inc.
2950 Metro Drive, Suite 101
Minneapolis, MN 55425
(952) 851-7901
---------
Tenon Intersystems' Post.Office Mailing List
To unsubscribe: send mailto:post_office-request@xxxxxxxxxxxxxxx
with the body only containing:
unsubscribe
Find the searchable mailing list archives at:
http://postoffice.computeroil.com/
|
| Tenon Home |
Products |
Order |
Contact Us |
About Tenon |
Register |
Tech Support |
Resources |
Press Room |
Mailing Lists |
|
Copyright©2003 Tenon Intersystems, 232 Anacapa Street, Suite 2A, Santa Barbara,
CA 93101. All rights reserved.
Questions about our website - Contact:
webmaster@tenon.com.
|
|