Discussion:
Strange postfix behavior - SMTPD connects and gets locked
(too old to reply)
r***@gmail.com
2016-12-09 20:42:25 UTC
Permalink
Hi everyone,

I'm facing a very weird postfix behavior.

I have VMWARE running in two different datacenters.

I had this machine working on DC-A perfectly.

After moving it to DC-B, postfix is taking too long to respond to HELO / EHLO commands. In some cases, client gets locked and connection is freezed.

My first tip was to check my DNS. I have tried using Google and some other DNSs but it gets even worse (hmm, might be a fail point). The best performance I can get is using local DNS server.

After that, I tried reinstalling gentoo and after getting everything up, I had the same problem.

I started sending 1 email. When I try about 40 consecutive connections in different IPs (I have POSTFIX working on 250 IPS on this same box), it kinds of stop working properly.

I'm able to connect to SMTP on port 25, but after I type HELO HOST, it freezes the connection.

In MAIL.LOG, there are some uncommon information:

connect from unknown[unknown]
Dec 9 20:13:24 smtp-dev-spo postfix/smtpd[393313]: lost connection after CONNECT from unknown[unknown]
Dec 9 20:13:24 smtp-dev-spo postfix/smtpd[393313]: disconnect from unknown[unknown] commands=0/0
Dec 9 20:13:38 smtp-dev-spo postfix/anvil[396384]: statistics: max connection rate 1/60s for (smtp:unknown) at Dec 9 20:10:18
Dec 9 20:13:38 smtp-dev-spo postfix/anvil[396384]: statistics: max connection count 1 for (smtp:unknown) at Dec 9 20:10:18

The unknown info calls for my attention. I'm connecting from local (public) IP.

I then tried to disable ANVIL and SCACHE but it got even worse.

Another strange identified behavior is that when I connect to the SMTP, after waiting for EHLO command, server returns with this info:

cache btree:/var/lib/mailservers/XXXX.CCC.com/verify_cache full cleanup: retained=0 dropped=0 entries

I have 250 instances of postfix working on same box. Works smoothly on DC-A but can't get it to work on the other one.

I'm using GENTOO with POSTFIX+SASL+COURIER AUTHLIB.

MySQL is running local but I had changed to another huge server we have and symptom persists.

Any tip would be much appreciated.

Thanks a lot.

BR,

Rafael
Rafael Azevedo
2016-12-09 20:52:53 UTC
Permalink
Post by r***@gmail.com
Hi everyone,
I'm facing a very weird postfix behavior.
I have VMWARE running in two different datacenters.
I had this machine working on DC-A perfectly.
After moving it to DC-B, postfix is taking too long to respond to HELO / EHLO commands. In some cases, client gets locked and connection is freezed.
My first tip was to check my DNS. I have tried using Google and some other DNSs but it gets even worse (hmm, might be a fail point). The best performance I can get is using local DNS server.
After that, I tried reinstalling gentoo and after getting everything up, I had the same problem.
I started sending 1 email. When I try about 40 consecutive connections in different IPs (I have POSTFIX working on 250 IPS on this same box), it kinds of stop working properly.
I'm able to connect to SMTP on port 25, but after I type HELO HOST, it freezes the connection.
connect from unknown[unknown]
Dec 9 20:13:24 smtp-dev-spo postfix/smtpd[393313]: lost connection after CONNECT from unknown[unknown]
Dec 9 20:13:24 smtp-dev-spo postfix/smtpd[393313]: disconnect from unknown[unknown] commands=0/0
Dec 9 20:13:38 smtp-dev-spo postfix/anvil[396384]: statistics: max connection rate 1/60s for (smtp:unknown) at Dec 9 20:10:18
Dec 9 20:13:38 smtp-dev-spo postfix/anvil[396384]: statistics: max connection count 1 for (smtp:unknown) at Dec 9 20:10:18
The unknown info calls for my attention. I'm connecting from local (public) IP.
I then tried to disable ANVIL and SCACHE but it got even worse.
cache btree:/var/lib/mailservers/XXXX.CCC.com/verify_cache full cleanup: retained=0 dropped=0 entries
I have 250 instances of postfix working on same box. Works smoothly on DC-A but can't get it to work on the other one.
I'm using GENTOO with POSTFIX+SASL+COURIER AUTHLIB.
MySQL is running local but I had changed to another huge server we have and symptom persists.
Any tip would be much appreciated.
Thanks a lot.
BR,
Rafael
I'm also getting this error message:

Dec 9 18:51:02 smtp-dev-spo postfix/smtpd[450802]: lost connection after EHLO from XXX.CCCC.com[1.2.3.4]
Dec 9 18:51:02 smtp-dev-spo postfix/smtpd[450802]: disconnect from XXX.CCCC.com[1.2.3.4] ehlo=1 commands=1

Thanks a lot.
Rafael Azevedo
2016-12-09 20:56:20 UTC
Permalink
Post by r***@gmail.com
Hi everyone,
I'm facing a very weird postfix behavior.
I have VMWARE running in two different datacenters.
I had this machine working on DC-A perfectly.
After moving it to DC-B, postfix is taking too long to respond to HELO / EHLO commands. In some cases, client gets locked and connection is freezed.
My first tip was to check my DNS. I have tried using Google and some other DNSs but it gets even worse (hmm, might be a fail point). The best performance I can get is using local DNS server.
After that, I tried reinstalling gentoo and after getting everything up, I had the same problem.
I started sending 1 email. When I try about 40 consecutive connections in different IPs (I have POSTFIX working on 250 IPS on this same box), it kinds of stop working properly.
I'm able to connect to SMTP on port 25, but after I type HELO HOST, it freezes the connection.
connect from unknown[unknown]
Dec 9 20:13:24 smtp-dev-spo postfix/smtpd[393313]: lost connection after CONNECT from unknown[unknown]
Dec 9 20:13:24 smtp-dev-spo postfix/smtpd[393313]: disconnect from unknown[unknown] commands=0/0
Dec 9 20:13:38 smtp-dev-spo postfix/anvil[396384]: statistics: max connection rate 1/60s for (smtp:unknown) at Dec 9 20:10:18
Dec 9 20:13:38 smtp-dev-spo postfix/anvil[396384]: statistics: max connection count 1 for (smtp:unknown) at Dec 9 20:10:18
The unknown info calls for my attention. I'm connecting from local (public) IP.
I then tried to disable ANVIL and SCACHE but it got even worse.
cache btree:/var/lib/mailservers/XXXX.CCC.com/verify_cache full cleanup: retained=0 dropped=0 entries
I have 250 instances of postfix working on same box. Works smoothly on DC-A but can't get it to work on the other one.
I'm using GENTOO with POSTFIX+SASL+COURIER AUTHLIB.
MySQL is running local but I had changed to another huge server we have and symptom persists.
Any tip would be much appreciated.
Thanks a lot.
BR,
Rafael
More debug:

Dec 9 18:51:02 smtp-dev-spo postfix/smtpd[450802]: >>> END Helo command RESTRICTIONS <<<
Dec 9 18:51:02 smtp-dev-spo postfix/smtpd[450802]: match_list_match: AAAA.BBBB.COM: no match
Dec 9 18:51:02 smtp-dev-spo postfix/smtpd[450802]: match_list_match: 1.2.3.4: no match
Dec 9 18:51:02 smtp-dev-spo postfix/smtpd[450802]: > AAAA.BBBB.COM[1.2.3.4] 250-nodespo16.corporate-mail-us.com
Dec 9 18:51:02 smtp-dev-spo postfix/smtpd[450802]: > AAAA.BBBB.COM[1.2.3.4] 250-PIPELINING
Dec 9 18:51:02 smtp-dev-spo postfix/smtpd[450802]: > AAAA.BBBB.COM[1.2.3.4] 250-SIZE 1024000
Dec 9 18:51:02 smtp-dev-spo postfix/smtpd[450802]: > AAAA.BBBB.COM[1.2.3.4] 250-ETRN
Dec 9 18:51:02 smtp-dev-spo postfix/smtpd[450802]: > AAAA.BBBB.COM[1.2.3.4] 250-AUTH LOGIN PLAIN
Dec 9 18:51:02 smtp-dev-spo postfix/smtpd[450802]: > AAAA.BBBB.COM[1.2.3.4] 250-AUTH=LOGIN PLAIN
Dec 9 18:51:02 smtp-dev-spo postfix/smtpd[450802]: > AAAA.BBBB.COM[1.2.3.4] 250-ENHANCEDSTATUSCODES
Dec 9 18:51:02 smtp-dev-spo postfix/smtpd[450802]: > AAAA.BBBB.COM[1.2.3.4] 250-8BITMIME
Dec 9 18:51:02 smtp-dev-spo postfix/smtpd[450802]: > AAAA.BBBB.COM[1.2.3.4] 250-DSN
Dec 9 18:51:02 smtp-dev-spo postfix/smtpd[450802]: > AAAA.BBBB.COM[1.2.3.4] 250 SMTPUTF8
Dec 9 18:51:02 smtp-dev-spo postfix/smtpd[450802]: watchdog_pat: 0x55bdd789f4e0
Dec 9 18:51:02 smtp-dev-spo postfix/smtpd[450802]: smtp_get: EOF
Dec 9 18:51:02 smtp-dev-spo postfix/smtpd[450802]: match_hostname: smtpd_client_event_limit_exceptions: AAAA.BBBB.COM ~? 168.100.189.0/28
Dec 9 18:51:02 smtp-dev-spo postfix/smtpd[450802]: match_hostaddr: smtpd_client_event_limit_exceptions: 1.2.3.4 ~? 168.100.189.0/28
Dec 9 18:51:02 smtp-dev-spo postfix/smtpd[450802]: match_hostname: smtpd_client_event_limit_exceptions: AAAA.BBBB.COM ~? 127.0.0.0/8
Dec 9 18:51:02 smtp-dev-spo postfix/smtpd[450802]: match_hostaddr: smtpd_client_event_limit_exceptions: 1.2.3.4 ~? 127.0.0.0/8
Dec 9 18:51:02 smtp-dev-spo postfix/smtpd[450802]: match_hostname: smtpd_client_event_limit_exceptions: AAAA.BBBB.COM ~? 10.20.30.0/24
Dec 9 18:51:02 smtp-dev-spo postfix/smtpd[450802]: match_hostaddr: smtpd_client_event_limit_exceptions: 1.2.3.4 ~? 10.20.30.0/24
Dec 9 18:51:02 smtp-dev-spo postfix/smtpd[450802]: match_hostname: smtpd_client_event_limit_exceptions: AAAA.BBBB.COM ~? 177.85.0.0/24
Dec 9 18:51:02 smtp-dev-spo postfix/smtpd[450802]: match_hostaddr: smtpd_client_event_limit_exceptions: 1.2.3.4 ~? 177.85.0.0/24
Dec 9 18:51:02 smtp-dev-spo postfix/smtpd[450802]: match_hostname: smtpd_client_event_limit_exceptions: AAAA.BBBB.COM ~? 168.196.188.0/22
Dec 9 18:51:02 smtp-dev-spo postfix/smtpd[450802]: match_hostaddr: smtpd_client_event_limit_exceptions: 1.2.3.4 ~? 1.2.3.4/22
Dec 9 18:51:02 smtp-dev-spo postfix/smtpd[450802]: lost connection after EHLO from AAAA.BBBB.COM[1.2.3.4]
Dec 9 18:51:02 smtp-dev-spo postfix/smtpd[450802]: disconnect from AAAA.BBBB.COM[1.2.3.4] ehlo=1 commands=1


Domains and IPs have been changed for purpose.

Thanks!
Rafael Azevedo
2016-12-09 21:08:28 UTC
Permalink
Post by r***@gmail.com
Hi everyone,
I'm facing a very weird postfix behavior.
I have VMWARE running in two different datacenters.
I had this machine working on DC-A perfectly.
After moving it to DC-B, postfix is taking too long to respond to HELO / EHLO commands. In some cases, client gets locked and connection is freezed.
My first tip was to check my DNS. I have tried using Google and some other DNSs but it gets even worse (hmm, might be a fail point). The best performance I can get is using local DNS server.
After that, I tried reinstalling gentoo and after getting everything up, I had the same problem.
I started sending 1 email. When I try about 40 consecutive connections in different IPs (I have POSTFIX working on 250 IPS on this same box), it kinds of stop working properly.
I'm able to connect to SMTP on port 25, but after I type HELO HOST, it freezes the connection.
connect from unknown[unknown]
Dec 9 20:13:24 smtp-dev-spo postfix/smtpd[393313]: lost connection after CONNECT from unknown[unknown]
Dec 9 20:13:24 smtp-dev-spo postfix/smtpd[393313]: disconnect from unknown[unknown] commands=0/0
Dec 9 20:13:38 smtp-dev-spo postfix/anvil[396384]: statistics: max connection rate 1/60s for (smtp:unknown) at Dec 9 20:10:18
Dec 9 20:13:38 smtp-dev-spo postfix/anvil[396384]: statistics: max connection count 1 for (smtp:unknown) at Dec 9 20:10:18
The unknown info calls for my attention. I'm connecting from local (public) IP.
I then tried to disable ANVIL and SCACHE but it got even worse.
cache btree:/var/lib/mailservers/XXXX.CCC.com/verify_cache full cleanup: retained=0 dropped=0 entries
I have 250 instances of postfix working on same box. Works smoothly on DC-A but can't get it to work on the other one.
I'm using GENTOO with POSTFIX+SASL+COURIER AUTHLIB.
MySQL is running local but I had changed to another huge server we have and symptom persists.
Any tip would be much appreciated.
Thanks a lot.
BR,
Rafael
Ohhh..

I forgot to mention, MEM USAGE: 50% CPU: 0.4
Rafael Azevedo
2016-12-15 16:53:30 UTC
Permalink
Hello everyone,

Since I was alone in the dark, there were no other option than going to fight against this problem.

I finally got it working.

It was a SYSCTL tuning problem.

I had to add the following like into sysctl.conf to get it working:

kernel.sysrq = 1

After that, POSTFIX became back to work.

Thanks everyone.

R

Loading...