destination_rate_delay and connection_reuse_time_limit

Discussion:

destination_rate_delay and connection_reuse_time_limit

(too old to reply)

Rafael Azevedo - IAGENTE

2013-01-07 13:12:22 UTC

Guys,

I've identified a missbehavior on postfix.

I do use destination_rate_delay for specific transport queue, and I found out that connection cache is not working when I have transport_destination_rate_delay > 1s.

If I change the destination_rate_delay to higher than 1s, connection cache won't work. Changing it back to 1s it all goes perfect.

The problem is that we're having a hard time to deliver email to specific destination, thats why I have a specific transport, so I can manage the queue on this. But I'm still having the same error "You've reached the sending limit to this domain".

So this is what I have:
specificTransport_destination_concurrency_limit = 4
specificTransport_destination_rate_delay = 1s
specificTransport_connection_cache_reuse_limit = 100
specificTransport_bounce_queue_lifetime = 6h
specificTransport_maximal_queue_lifetime = 12h
specificTransport_connection_cache_time_limit = 30s
specificTransport_connection_reuse_time_limit = 600s
specificTransport_connection_cache_destinations = static:all

What I really need to do is send only one email every 5 seconds, keeping the connection opened between one email and another. I can't have more than 20 opened connections in a 10 minutes timeframe, or I'll get blocked.

Is there anyway I can do this?

Can anybody help please?

Thanks.

Att.
--
Rafael Azevedo | IAGENTE
Fone: 51 3086.0262
MSN: ***@hotmail.com
Visite: www.iagente.com.br

Wietse Venema

2013-01-07 13:28:51 UTC

Post by Rafael Azevedo - IAGENTE
I do use destination_rate_delay for specific transport queue, and
I found out that connection cache is not working when I have
transport_destination_rate_delay > 1s.

The default time limit is 2s, and it is enforced in multiple places.
You have found only one.

As Postfix documentation says, you must not increase these time
limits without permission from the receiver. Connection caching
is a performance tool, not a tool to circumvent receiver policies.

Wietse

Rafael Azevedo - IAGENTE

2013-01-07 13:34:45 UTC

Hi Wietse,

I don't really get. I'm also sure postfix has a way to solve this issue.

This is what I'm trying to do:

- I need to have only one process to this transport's queue.
- This queue must respect the destination's policy, so I can't have more than 20 opened connections in 10 minutes timeframe. Thats why I want to use connection cache.

According to my configuration, I'm having only one process for this transport, also limiting the sending time, holding down delivery process, waiting 1 second for each sent message before sending another one.

And since this transport handles only specific domains, I really don't have to worry about receiver policies, because they told me to send as much as I can using the same connection, avoiding opening one connection per message.

What do you recommend me to do? Can you help me out on tunning this up?

Thanks.

Att.
--
Rafael Azevedo | IAGENTE
Fone: 51 3086.0262
MSN: ***@hotmail.com
Visite: www.iagente.com.br

Post by Wietse Venema

Post by Rafael Azevedo - IAGENTE
I do use destination_rate_delay for specific transport queue, and
I found out that connection cache is not working when I have
transport_destination_rate_delay > 1s.

The default time limit is 2s, and it is enforced in multiple places.
You have found only one.
As Postfix documentation says, you must not increase these time
limits without permission from the receiver. Connection caching
is a performance tool, not a tool to circumvent receiver policies.
Wietse

Wietse Venema

2013-01-07 14:17:46 UTC

Post by Rafael Azevedo - IAGENTE
Hi Wietse,
I don't really get. I'm also sure postfix has a way to solve this issue.

I told you that there are two parameters that enforce the time limit.

Wietse

Rafael Azevedo - IAGENTE

2013-01-07 14:20:19 UTC

Could you please refresh my mind?

Thanks.

Att.
--
Rafael Azevedo | IAGENTE
Fone: 51 3086.0262
MSN: ***@hotmail.com
Visite: www.iagente.com.br

Post by Wietse Venema

Post by Rafael Azevedo - IAGENTE
Hi Wietse,
I don't really get. I'm also sure postfix has a way to solve this issue.

I told you that there are two parameters that enforce the time limit.
Wietse

Viktor Dukhovni

2013-01-07 16:25:32 UTC

Post by Rafael Azevedo - IAGENTE
- I need to have only one process to this transport's queue.

mumble_destination_concurrency_limit = 1

Post by Rafael Azevedo - IAGENTE
- This queue must respect the destination's policy, so I can't
have more than 20 opened connections in 10 minutes timeframe. Thats
why I want to use connection cache.

The connection cache is used automatically when there is a backlog
of mail to the destination. You are defeating the connection cache
by enforcing a rate limit of 1, which rate limits deliveries, not
connections. DO NOT set a rate limit.

Post by Rafael Azevedo - IAGENTE
According to my configuration, I'm having only one process for
this transport, also limiting the sending time, holding down delivery
process, waiting 1 second for each sent message before sending
another one.

Instead of setting a process limit of 1, you can just specify an
explicit nexthop for the domains whose concurrency you want to
aggregate:

example.com mumble:example.com
example.net mumble:example.com
example.edu mumble:example.com
...

This should the queue manager schedule deliveries to these domains
as it will combine the queues for all the domains that use the
transport into a single queue (while using the MX records for
a suitably chosen single domain).

Post by Rafael Azevedo - IAGENTE
And since this transport handles only specific domains, I really
don't have to worry about receiver policies, because they told me
to send as much as I can using the same connection, avoiding opening
one connection per message.

Don't enable rate delays. Do specify a common nexthop for all domains
that share the transport. Don't mess with the connection cache timers.

--
Viktor.

Rafael Azevedo - IAGENTE

2013-01-07 16:37:03 UTC

Hi Viktor, thanks for helping.

I've done something very similar.

I created different named transports for specific domains and have all domains I need a special treatment to use this named transport.

So since I'm using Postfix + MySQL, I have a transport table with all domains and destination transport. Its quite the same thing you're proposing.

Yet, I'm still with the same problem. Its worthless to have all domains I want on a specific transport and now controlling the throughput.

I don't need just a transport queue for this, I need to control the throughput for all domains that are in this transport, thats why I'm working with the connection cache timers. I've also noticed a huge delivery improvement for Hotmail and Yahoo since I've started messing things around.

So in the real life, I have about 10.000 domains that are hosted in the same hosting company. This company has a rigid control of their resources. Then I got all domains we had traffic in the last 3 months, looked up for their hoisting company, had everything mapped and put them all on this named transport. Now all I need is to control the delivery for these specific destinations.

Basically this is what I need to be careful about: not send more than 1.000 in 10 minutes and not having more than 20 opened connections in the same time frame.

Is there anything else I can do to have a better control of my throughput?

Any help would be very appreciated.

Thanks in advance.

Att.
--
Rafael Azevedo | IAGENTE
Fone: 51 3086.0262
MSN: ***@hotmail.com
Visite: www.iagente.com.br

Post by Viktor Dukhovni

Post by Rafael Azevedo - IAGENTE
- I need to have only one process to this transport's queue.

mumble_destination_concurrency_limit = 1

Post by Rafael Azevedo - IAGENTE
- This queue must respect the destination's policy, so I can't
have more than 20 opened connections in 10 minutes timeframe. Thats
why I want to use connection cache.

The connection cache is used automatically when there is a backlog
of mail to the destination. You are defeating the connection cache
by enforcing a rate limit of 1, which rate limits deliveries, not
connections. DO NOT set a rate limit.

Post by Rafael Azevedo - IAGENTE
According to my configuration, I'm having only one process for
this transport, also limiting the sending time, holding down delivery
process, waiting 1 second for each sent message before sending
another one.

Instead of setting a process limit of 1, you can just specify an
explicit nexthop for the domains whose concurrency you want to
example.com mumble:example.com
example.net mumble:example.com
example.edu mumble:example.com
...
This should the queue manager schedule deliveries to these domains
as it will combine the queues for all the domains that use the
transport into a single queue (while using the MX records for
a suitably chosen single domain).

Post by Rafael Azevedo - IAGENTE
And since this transport handles only specific domains, I really
don't have to worry about receiver policies, because they told me
to send as much as I can using the same connection, avoiding opening
one connection per message.

Don't enable rate delays. Do specify a common nexthop for all domains
that share the transport. Don't mess with the connection cache timers.
--
Viktor.

Viktor Dukhovni

2013-01-07 16:47:56 UTC

Post by Rafael Azevedo - IAGENTE
I've done something very similar.

If you want help, please take some time to read and follow the
advice you receive completely and accurately. "Similar" is another
way of saying "incorrect".

Post by Rafael Azevedo - IAGENTE
I created different named transports for specific domains and
have all domains I need a special treatment to use this named
transport.

To achieve a total concurrency limit across multiple destination
domains, you must specify a common nexthop, not just a common
transport.

Post by Rafael Azevedo - IAGENTE
So since I'm using Postfix + MySQL, I have a transport table with
all domains and destination transport. Its quite the same thing
you're proposing.

No, it is not, since it leaves out the common nexthop which
consolidates the queues for all the domains.

Post by Rafael Azevedo - IAGENTE
Yet, I'm still with the same problem.

Do take the time to follow advice completely and accurately.

Post by Rafael Azevedo - IAGENTE
So in the real life, I have about 10.000 domains that are hosted in
the same hosting company. This company has a rigid control of their
resources.

Your best bet is to get whitelisted by the receiving system for a higher
throughput limit.

If your average input message rate for these domains falls below the
current cap, and you're just trying to smooth out the spikes, the
advice I gate is correct, if you're willing to listen.

Post by Rafael Azevedo - IAGENTE
Is there anything else I can do to have a better control of my throughput?

Understand that Postfix queues are per transport/nexthop, not merely
per transport. To schedule mail via a specific provider as a single
stream (queue), specify an explicit nexthop for all domains that
transit that provider. Since you're already using an explicit
transport, it is easy to append the appropriate nexthop.

Post by Rafael Azevedo - IAGENTE
Any help would be very appreciated.

Ideally, you will not dismiss help when it is given.

--
Viktor.

Rafael Azevedo - IAGENTE

2013-01-07 17:06:42 UTC

Hi Viktor,

Thanks once again for helping me on this.

Please understand that I'm very "open" and thankful for any help. I'm also trying to understand what you meant.

Getting whitelisted is always the best solution, but believe me, there are some providers that just don't answer any email, they just won't help us to even work in compliance with their rules. Thats why I'm asking for help here.

Sometimes you guys speak in a very advanced language and it may be hard for some people to understand what you're meaning. Worse than that is when we try to explain our problem and we're not clear enough. So I tried to better explain myself and then you became with another solution.

Anyway, I'll search how to use this "next hoop" feature and see if it fixes the issue. Although I'm still having to respect the amount of message per time frame so the question persists: how can I low down delivery to these destinations without opening too many connections to them? Having them all in one only transport/nexthoop will not fix the problem if I don't control the throughput, right?

Sorry for the questions, I'm really trying to understand the solution here.

Thanks once again.

Att.
--
Rafael Azevedo | IAGENTE
Fone: 51 3086.0262
MSN: ***@hotmail.com
Visite: www.iagente.com.br

Post by Viktor Dukhovni

Post by Rafael Azevedo - IAGENTE
I've done something very similar.

If you want help, please take some time to read and follow the
advice you receive completely and accurately. "Similar" is another
way of saying "incorrect".

Post by Rafael Azevedo - IAGENTE
I created different named transports for specific domains and
have all domains I need a special treatment to use this named
transport.

To achieve a total concurrency limit across multiple destination
domains, you must specify a common nexthop, not just a common
transport.

Post by Rafael Azevedo - IAGENTE
So since I'm using Postfix + MySQL, I have a transport table with
all domains and destination transport. Its quite the same thing
you're proposing.

No, it is not, since it leaves out the common nexthop which
consolidates the queues for all the domains.

Post by Rafael Azevedo - IAGENTE
Yet, I'm still with the same problem.

Do take the time to follow advice completely and accurately.

Post by Rafael Azevedo - IAGENTE
So in the real life, I have about 10.000 domains that are hosted in
the same hosting company. This company has a rigid control of their
resources.

Your best bet is to get whitelisted by the receiving system for a higher
throughput limit.
If your average input message rate for these domains falls below the
current cap, and you're just trying to smooth out the spikes, the
advice I gate is correct, if you're willing to listen.

Post by Rafael Azevedo - IAGENTE
Is there anything else I can do to have a better control of my throughput?

Understand that Postfix queues are per transport/nexthop, not merely
per transport. To schedule mail via a specific provider as a single
stream (queue), specify an explicit nexthop for all domains that
transit that provider. Since you're already using an explicit
transport, it is easy to append the appropriate nexthop.

Post by Rafael Azevedo - IAGENTE
Any help would be very appreciated.

Ideally, you will not dismiss help when it is given.
--
Viktor.

Rafael Azevedo - IAGENTE

2013-01-07 17:19:39 UTC

Hi Viktor,

I was reading the documentation and found out something very interesting.

If I use mumble_destination_concurrency_limit = 1, the destination is a recipient not a domain.

Since I'm trying to control the throughput per destination domain, it is necessary to use destination_concurrency_limit > 1, in this case I believe that if I use mumble_destination_concurrency_limit = 2 would be good for this issue.

default_destination_concurrency_limit (default: 20)
The default maximal number of parallel deliveries to the same destination. This is the default limit for delivery via the lmtp(8), pipe(8), smtp(8) and virtual(8) delivery agents. With per-destination recipient limit > 1, a destination is a domain, otherwise it is a recipient.

Is this correct?

Thanks in advance.

Att.
--
Rafael Azevedo | IAGENTE
Fone: 51 3086.0262
MSN: ***@hotmail.com
Visite: www.iagente.com.br

Post by Viktor Dukhovni

Post by Rafael Azevedo - IAGENTE
- I need to have only one process to this transport's queue.

mumble_destination_concurrency_limit = 1

Post by Rafael Azevedo - IAGENTE
- This queue must respect the destination's policy, so I can't
have more than 20 opened connections in 10 minutes timeframe. Thats
why I want to use connection cache.

The connection cache is used automatically when there is a backlog
of mail to the destination. You are defeating the connection cache
by enforcing a rate limit of 1, which rate limits deliveries, not
connections. DO NOT set a rate limit.

Post by Rafael Azevedo - IAGENTE
According to my configuration, I'm having only one process for
this transport, also limiting the sending time, holding down delivery
process, waiting 1 second for each sent message before sending
another one.

Instead of setting a process limit of 1, you can just specify an
explicit nexthop for the domains whose concurrency you want to
example.com mumble:example.com
example.net mumble:example.com
example.edu mumble:example.com
...
This should the queue manager schedule deliveries to these domains
as it will combine the queues for all the domains that use the
transport into a single queue (while using the MX records for
a suitably chosen single domain).

Post by Rafael Azevedo - IAGENTE
And since this transport handles only specific domains, I really
don't have to worry about receiver policies, because they told me
to send as much as I can using the same connection, avoiding opening
one connection per message.

Don't enable rate delays. Do specify a common nexthop for all domains
that share the transport. Don't mess with the connection cache timers.
--
Viktor.

Rafael Azevedo - IAGENTE

2013-01-07 17:29:53 UTC

Hi Viktor,

Thanks for the help.

I believe I've activated the next hop feature in my transport table.

If I understood it right, all I had to do is tell postfix that these domains belongs to my named transport specifying the domain.

So this is how it is now:
criticaldomain.tld slow:criticaldomain.tld
domain.tld slow:criticaldomain.tld

Is it right?

Thanks once again.

Att.
--
Rafael Azevedo | IAGENTE
Fone: 51 3086.0262
MSN: ***@hotmail.com
Visite: www.iagente.com.br

Post by Viktor Dukhovni

Post by Rafael Azevedo - IAGENTE
I've done something very similar.

If you want help, please take some time to read and follow the
advice you receive completely and accurately. "Similar" is another
way of saying "incorrect".

Post by Rafael Azevedo - IAGENTE
I created different named transports for specific domains and
have all domains I need a special treatment to use this named
transport.

To achieve a total concurrency limit across multiple destination
domains, you must specify a common nexthop, not just a common
transport.

Post by Rafael Azevedo - IAGENTE
So since I'm using Postfix + MySQL, I have a transport table with
all domains and destination transport. Its quite the same thing
you're proposing.

No, it is not, since it leaves out the common nexthop which
consolidates the queues for all the domains.

Post by Rafael Azevedo - IAGENTE
Yet, I'm still with the same problem.

Do take the time to follow advice completely and accurately.

Post by Rafael Azevedo - IAGENTE
So in the real life, I have about 10.000 domains that are hosted in
the same hosting company. This company has a rigid control of their
resources.

Your best bet is to get whitelisted by the receiving system for a higher
throughput limit.
If your average input message rate for these domains falls below the
current cap, and you're just trying to smooth out the spikes, the
advice I gate is correct, if you're willing to listen.

Post by Rafael Azevedo - IAGENTE
Is there anything else I can do to have a better control of my throughput?

Understand that Postfix queues are per transport/nexthop, not merely
per transport. To schedule mail via a specific provider as a single
stream (queue), specify an explicit nexthop for all domains that
transit that provider. Since you're already using an explicit
transport, it is easy to append the appropriate nexthop.

Post by Rafael Azevedo - IAGENTE
Any help would be very appreciated.

Ideally, you will not dismiss help when it is given.
--
Viktor.

Viktor Dukhovni

2013-01-07 17:52:46 UTC

Post by Rafael Azevedo - IAGENTE
Anyway, I'll search how to use this "next hoop" feature and see

The term is "nexthop", this specifies the next system or systems
to which the message will be forwarded en-route to its destination
mailbox. With SMTP the nexthop is a domain (subject to MX lookups)
or [gateway] (not subject to MX lookups).

The syntax of transport entries for each delivery agent is specified
in the man page for that delivery agent, see the "SMTP DESTINATION SYNTAX"
section of:

http://www.postfix.org/smtp.8.html

Post by Rafael Azevedo - IAGENTE
Although I'm still having to respect the amount of message per
time frame so the question persists: how can I low down delivery
to these destinations without opening too many connections to them?

You can reduce the connection rate by caching connections, which
works when you consolidate all the domains that use that provider
to a single transport/nexthop.

You can only reduce the message delivery rate by sending less mail.
To reduce the peak message delivery rate, you need to insert
artifiical delays between message deliveries, but this defeats
connection reuse. You can't have both if the limits are sufficiently
aggressive. You should probably ignore the message rate limit.

By capping both message rates and connection rates the receiving
system is hostile to legitimate bulk email. If the the hosted
users actually want your email, ask them to talk to the provider
on your behalf.

Otherwise, you can spread the load over multiple servers each
of which falls under the rate limits (snow-shoe).

Post by Rafael Azevedo - IAGENTE
Having them all in one only transport/nexthop will not fix the
problem if I don't control the throughput, right?

This will cause connection reuse, which combined with a destination
concurrency limit of 1, will minimize the number of connections
made.

--
Viktor.

Viktor Dukhovni

2013-01-07 17:57:54 UTC

Post by Rafael Azevedo - IAGENTE
I believe I've activated the next hop feature in my transport table.
If I understood it right, all I had to do is tell postfix that
these domains belongs to my named transport specifying the domain.
criticaldomain.tld slow:criticaldomain.tld
domain.tld slow:criticaldomain.tld
Is it right?

Correct. Together with:

slow_destination_concurrency_limit = 1
slow_destination_concurrency_failed_cohort_limit = 5

and without any:

slow_destination_rate_delay

which is equivalent to:

slow_destination_rate_delay = 0s

also don't change the defaults:

smtp_connection_cache_on_demand = yes
smtp_connection_cache_time_limit = 2s

--
Viktor.

Viktor Dukhovni

2013-01-07 17:59:23 UTC

Post by Rafael Azevedo - IAGENTE
If I use mumble_destination_concurrency_limit = 1, the destination
is a recipient not a domain.

This is wrong. The setting in question is the recipient_limit, not
the concurrency limit.

Post by Rafael Azevedo - IAGENTE
default_destination_concurrency_limit (default: 20)
The default maximal number of parallel deliveries to the same destination. This is the default limit for delivery via the lmtp(8), pipe(8), smtp(8) and virtual(8) delivery agents. With per-destination recipient limit > 1, a destination is a domain, otherwise it is a recipient.
Is this correct?

It says when the "recipient limit > 1".

--
Viktor.

Rafael Azevedo - IAGENTE

2013-01-07 18:07:02 UTC

Thank you so much Viktor, now I fully understand what you said.

Cheers.

Att.
--
Rafael Azevedo | IAGENTE
Fone: 51 3086.0262
MSN: ***@hotmail.com
Visite: www.iagente.com.br

Post by Viktor Dukhovni

Post by Rafael Azevedo - IAGENTE
I believe I've activated the next hop feature in my transport table.
If I understood it right, all I had to do is tell postfix that
these domains belongs to my named transport specifying the domain.
criticaldomain.tld slow:criticaldomain.tld
domain.tld slow:criticaldomain.tld
Is it right?

slow_destination_concurrency_limit = 1
slow_destination_concurrency_failed_cohort_limit = 5
slow_destination_rate_delay
slow_destination_rate_delay = 0s
smtp_connection_cache_on_demand = yes
smtp_connection_cache_time_limit = 2s
--
Viktor.

Rafael Azevedo - IAGENTE

2013-01-07 18:24:20 UTC

Hi Viktor,

I've done exactally what you said and notice that the connection cache is not being used anymore.

I ran a script with loop sending email to few recipients, and the cache seems not be working (after commenting slow_destination_rate_delay).

Changing slow_destination_rate_delay to 1s enables postfix cache's usage again.

Can you give me a tip?

Thanks once again.

Att.
--
Rafael Azevedo | IAGENTE
Fone: 51 3086.0262
MSN: ***@hotmail.com
Visite: www.iagente.com.br

Post by Viktor Dukhovni

Post by Rafael Azevedo - IAGENTE
I believe I've activated the next hop feature in my transport table.
If I understood it right, all I had to do is tell postfix that
these domains belongs to my named transport specifying the domain.
criticaldomain.tld slow:criticaldomain.tld
domain.tld slow:criticaldomain.tld
Is it right?

slow_destination_concurrency_limit = 1
slow_destination_concurrency_failed_cohort_limit = 5
slow_destination_rate_delay
slow_destination_rate_delay = 0s
smtp_connection_cache_on_demand = yes
smtp_connection_cache_time_limit = 2s
--
Viktor.

Viktor Dukhovni

2013-01-07 18:40:37 UTC

Post by Rafael Azevedo - IAGENTE
I've done exactally what you said and notice that the connection
cache is not being used anymore.

You have enabled cache-on-demand behaviour. This happens when the active
queue contains a "backlog" of messages to the destination. If your
input rate is sufficiently low, messages leave as quickly as they
arrive and connections are not cached.

Post by Rafael Azevedo - IAGENTE
I ran a script with loop sending email to few recipients, and
the cache seems not be working (after commenting
slow_destination_rate_delay).

This does not generate mail sufficiently fast, it is delivered as
fast as it arrives with no backlog.

Post by Rafael Azevedo - IAGENTE
Changing slow_destination_rate_delay to 1s enables postfix cache's
usage again.

You can set the rate delay to 1s, (but not more), provided 1msg/sec
is above your long-term average message rate to the destination.

If you just want to always cache, in master.cf change the "slow"
entry to add the option:

master.cf:
slow unix ... smtp
-o smtp_connection_cache_destinations=$slow_connection_cache_destinations

and then in main.cf add:

main.cf:
# Perhaps safer:
# slow_connection_cache_destinations = example.com
slow_connection_cache_destinations = static:all

Or instead of 'static:all' just the nexthop you use in the transport
table for the slow domains in question, that way other nexthops that
use the slow transport can still use demand caching. You can of course
also use a table with appropriate keys:

main.cf:
indexed = ${default_database_type}:${config_directory}/
slow_connection_cache_destinations = ${indexed}cache-nexthops

cache-nexthops:
example.com whatever

Don't forget "postfix reload".

--
Viktor.

Wietse Venema

2013-01-07 21:02:36 UTC

Post by Viktor Dukhovni

Post by Rafael Azevedo - IAGENTE
I've done exactally what you said and notice that the connection
cache is not being used anymore.

You have enabled cache-on-demand behaviour. This happens when the active
queue contains a "backlog" of messages to the destination. If your
input rate is sufficiently low, messages leave as quickly as they
arrive and connections are not cached.

Connection cache time limits are controlled by two parameters: one
in the delivery agent, and one in the scache daemon. It's the second
parameter that he is missing all the time.

Wietse

Viktor Dukhovni

2013-01-08 01:00:28 UTC

Post by Wietse Venema

Post by Viktor Dukhovni

Post by Rafael Azevedo - IAGENTE
I've done exactally what you said and notice that the connection
cache is not being used anymore.

You have enabled cache-on-demand behaviour. This happens when the active
queue contains a "backlog" of messages to the destination. If your
input rate is sufficiently low, messages leave as quickly as they
arrive and connections are not cached.

Connection cache time limits are controlled by two parameters: one
in the delivery agent, and one in the scache daemon. It's the second
parameter that he is missing all the time.

Yes, but he should NOT change it. It was a sound piece of defensive
programming on your part to discourage abusive configurations.
Postfix should not cache idle connections to remote sites unnecessarily
long, and more than 1-2 seconds is unecessarily long!

Thus I am not inclined to discuss about the safety-net control.

--
Viktor.

Rafael Azevedo - IAGENTE

2013-01-08 12:47:08 UTC

Hi Viktor,

I've added this into my main.cf:

slow_destination_concurrency_failed_cohort_limit = 5

But I noticed that even after a failure, postfix keeps trying to deliver to the destination.

Question: how can I stop postfix from trying to deliver emails after few failures?

I mean, if it is trying to deliver to xyz.com and it fails 5 times, should postfix keep trying to deliver or is there any way that we can stop delivering for some time?

I thought this could be done using _destination_concurrency_failed_cohort_limit. Am I doing something wrong?

After this adjustments I'm still having trouble to deliver emails to this specific destination.

This is the error I get:
said: 450 4.7.1 You've exceeded your sending limit to this domain. (in reply to end of DATA command))

I'm really trying to slow down the delivery speed in order to respect the destination's policies. I just can't figure out how to fix this issue.
I've also sent more than 20 emails to network's administrator and they just won't answer. Reading on the internet I found out that there are a lot of people having the same problem with this specific provider.

We send about 50k emails/day to 20k domains hosted on this provider that are being blocked.

Any help would be very appreciated.

Thanks in advance.

Att.
--
Rafael Azevedo | IAGENTE
Fone: 51 3086.0262
MSN: ***@hotmail.com
Visite: www.iagente.com.br

Post by Viktor Dukhovni
slow_destination_concurrency_failed_cohort_limit

Wietse Venema

2013-01-08 12:58:38 UTC

Rafael Azevedo - IAGENTE:
[ Charset ISO-8859-1 unsupported, converting... ]

Post by Rafael Azevedo - IAGENTE
Hi Viktor,
slow_destination_concurrency_failed_cohort_limit = 5

This stops deliveries after 5 ****COHORT**** failures.

Post by Rafael Azevedo - IAGENTE
I mean, if it is trying to deliver to xyz.com and it fails 5 times,

Yes, but you configured Postfix to stop after 5 ****COHORT**** failures.

For more information about ****COHORT**** failures and other parameters:

http://www.postfix.org/SCHEDULER_README.html

Wietse

Wietse Venema

2013-01-08 13:02:29 UTC

Post by Wietse Venema

Post by Viktor Dukhovni
slow_destination_concurrency_failed_cohort_limit = 5

This stops deliveries after 5 ****COHORT**** failures.

Post by Viktor Dukhovni
I mean, if it is trying to deliver to xyz.com and it fails 5 times,

Yes, but you configured Postfix to stop after 5 ****COHORT**** failures.
http://www.postfix.org/SCHEDULER_README.html

In short, Postfix ONLY adjusts concurrency after connect/handshake
failure, NEVER EVER for a 4XX reply to RCPT TO.

Wietse

Wietse Venema

2013-01-08 14:09:29 UTC

Hi Witsie,
Is there anyway we can adjust Postfix to stop delivering after a
4XX reply?

Postfix will stop delivering after TCP or SMTP handshake failure.
Postfix WILL NOT stop delivering due to 4xx reply AFTER the SMTP
protocol handshake.

Postfix is not a tool to work around receiver policy restrictions.
If you want to send more than a few email messages, then it is your
responsibility to make the necessary arrangements with receivers.

Over and out.

Wietse

Viktor Dukhovni

2013-01-08 14:36:46 UTC

Post by Viktor Dukhovni
slow_destination_concurrency_failed_cohort_limit = 5

This is fine, since you set the concurrency limit to 1, it is
intended to avoid shutting down deliveries after a single connection
failure. As Wietse points out this does not stop deliveries when
individual recipients are rejected, that is not evidence of the
site being down.

Post by Viktor Dukhovni
Question: how can I stop postfix from trying to deliver emails
after few failures?

It is not possible to aumatically throttle deliveries based on 4XX
replies to RCPT TO. This is not a useful signal that Postfix is
sending "too fast", nor is there any good way to dynamically
determine the correct rate.

Sites that impose indiscriminate (assuming you're sending legitimate
email, not spam) rate controls are breaking the email infrastructure.
Sadly, the work-around is to snowshoe---deploy more servers to split the
load over a larger number of IP addresses.

Post by Viktor Dukhovni
I mean, if it is trying to deliver to xyz.com and it fails 5
times, should postfix keep trying to deliver or is there any way
that we can stop delivering for some time?

Only if xyz.com is down, not if it is merely tempfailing RCPT TO.

Post by Viktor Dukhovni
said: 450 4.7.1 You've exceeded your sending limit to this domain.
(in reply to end of DATA command))

Since presumably at this point your connection rate is not high
(connections are being re-used), it seems that they're imposing
a message rate cap as well as a connection rate cap.

Send them less email.

Post by Viktor Dukhovni
I'm really trying to slow down the delivery speed in order to
respect the destination's policies. I just can't figure out how to
fix this issue.
We send about 50k emails/day to 20k domains hosted on this provider
that are being blocked.

The output rate cannot on average fall below the input rate. The
input rate is approximately 1/sec (there are 86400 seconds in a
day). Thus the slowest you can send is with a rate delay" of 1s.
If that's not slow enough, you're out of luck, and have to buy
more servers (possibly deploying them on separate networks).

The suggestion to turn off rate delays was based on an assumption
that they told you to avoid connecting too often and wanted all
the mail over a single connection (you wanted connection re-use),
but connection re-use works best when there is no rate delay.
A rate delay of 1s is still compatible with connection re-use,
and is the largest you can specify and still send more than
43.2k messages a day.

It may be simplest to outsource your traffic to an existing large
bulk email operator.

--
Viktor.

Rafael Azevedo - IAGENTE

2013-01-08 14:51:18 UTC

Thank you Witsie.

We have a huge mail volume thats why I'm trying to figure out a better way to deal with it.

Many providers have their own restrictions. We do work in compliance with most of them, but there are a few that just won't help at all, so its easy to tell me to make the necessary arrangements when they don't even have a support or abuse dept to get involved.

So since the problem is in my hands, I must find out a way to deal with it. Trying to slow down delivery speed is one way to get through.

I truly believe that postfix is the best MTA ever, but you might agree with me that when the receiver start blocking the sender, its worthless to keep trying to deliver.
The safest way is to stop delivering to these servers and try again later.

I just can't believe that postfix doesn't have a way to deal with it. It would make postfix much more efficient in delivery terms.

Anyway, thanks for your time and all the help. It was for sure very appreciated.

Any help here would also be appreciated.

Thanks in advance.

Att.
--
Rafael Azevedo | IAGENTE
Fone: 51 3086.0262
MSN: ***@hotmail.com
Visite: www.iagente.com.br

Post by Wietse Venema

Hi Witsie,
Is there anyway we can adjust Postfix to stop delivering after a
4XX reply?

Postfix will stop delivering after TCP or SMTP handshake failure.
Postfix WILL NOT stop delivering due to 4xx reply AFTER the SMTP
protocol handshake.
Postfix is not a tool to work around receiver policy restrictions.
If you want to send more than a few email messages, then it is your
responsibility to make the necessary arrangements with receivers.
Over and out.
Wietse

Wietse Venema

2013-01-08 15:34:30 UTC

Post by Rafael Azevedo - IAGENTE
I truly believe that postfix is the best MTA ever, but you might
agree with me that when the receiver start blocking the sender,
its worthless to keep trying to deliver.

1) Postfix will back off when the TCP or SMTP handshake fails. This
is a clear signal that a site is unavailable.

2) Postfix will not back off after [54]XX in the middle of a session.
IN THE GENERAL CASE this does not mean that the receiver is blocking
the sender, and backing off would be the wrong strategy.

Wietse

Rafael Azevedo - IAGENTE

2013-01-08 15:59:14 UTC

But Witsei, would you agree with me that error 4XX is (in general cases) a temporary error?

Why keep trying when we have a clear signal of a temporary error?

Also, if we had a temporary error control (number of deferred messages by recipient), it would be easy to identify when postfix should stop trying at least for a while.

Att.
--
Rafael Azevedo | IAGENTE
Fone: 51 3086.0262
MSN: ***@hotmail.com
Visite: www.iagente.com.br

Post by Wietse Venema

Post by Rafael Azevedo - IAGENTE
I truly believe that postfix is the best MTA ever, but you might
agree with me that when the receiver start blocking the sender,
its worthless to keep trying to deliver.

1) Postfix will back off when the TCP or SMTP handshake fails. This
is a clear signal that a site is unavailable.
2) Postfix will not back off after [54]XX in the middle of a session.
IN THE GENERAL CASE this does not mean that the receiver is blocking
the sender, and backing off would be the wrong strategy.
Wietse

Viktor Dukhovni

2013-01-08 16:07:31 UTC

But Witse, would you agree with me that error 4XX is (in general
cases) a temporary error?

It is a temporary error for *that* recipient. It is not a global
indication that the site is temporary unreachable. Nor is there
any indication how long one should wait, nor that waiting will make
things any better.

Generally, delaying deliver *increases* congestion, since more mail
arrives in the mean-time, and once delivery resumes the volume is
even higher.

Why keep trying when we have a clear signal of a temporary error?

Postfix does not "keep trying", it defers the message in question
and moves on to the next one. Your mental model of email queue
management is too naive.

This is a very difficult problem, and there is no simple answer.

Also, if we had a temporary error control (number of deferred
messages by recipient), it would be easy to identify when postfix
should stop trying at least for a while.

Given an arrival rate of ~50k msgs/day, you need to send at least
1 msg/sec to avoid growing an infinitely large queue. This is basic
arithmetic. Gowing slower does not work, your queue grows without
bound.

Let the recipients know that if they want to continue to receive
your email they should choose a new provider that is willing to
work with legitimate senders to resolve mail delivery issues. Then
stop sending them email.

--
Viktor.

Wietse Venema

2013-01-08 16:21:21 UTC

Post by Rafael Azevedo - IAGENTE
Why keep trying when we have a clear signal of a temporary error?

As Victor noted Postfix does not keep trying the SAME delivery.

Instead, Postfix tries to deliver a DIFFERENT message. It would be
incorrect IN THE GENERAL CASE to postpone ALL deliveries to a site
just because FIVE recipients were unavailable.

Wietse

Rafael Azevedo - IAGENTE

2013-01-08 16:38:45 UTC

Post by Wietse Venema

Post by Rafael Azevedo - IAGENTE
Why keep trying when we have a clear signal of a temporary error?

As Victor noted Postfix does not keep trying the SAME delivery.

Yes you're right and I know that. But it keeps trying for another recipients in the same domain.

Lets just say Yahoo starts blocking because of unusual traffic. Yahoo will keep telling "me" to try again later. In the mean time, new emails to be sent arrives and we never get the change to get unblocked.

So postfix gets the next yahoo recipient and try to deliver without considering that yahoo does not want us to keep trying for a while.

This is just an example. We don't have problem delivering to Yahoo, but to smaller providers.

Post by Wietse Venema
Instead, Postfix tries to deliver a DIFFERENT message. It would be
incorrect IN THE GENERAL CASE to postpone ALL deliveries to a site
just because FIVE recipients were unavailable.

Thats why it would be interesting to have a way to configure that. Lets say we have 100 deferred messages in sequence. Why keep trying? This way we loose time and processing, and have no way to improve reputation since we don't stop bugging them after they tell us to stop for a while.

Post by Wietse Venema
Wietse

Anyway, it doesn't seem to be possible to do this.

Thanks guys.

Rafael.

Rafael Azevedo - IAGENTE

2013-01-08 16:42:42 UTC

Att.
--
Rafael Azevedo | IAGENTE
Fone: 51 3086.0262
MSN: ***@hotmail.com
Visite: www.iagente.com.br

Post by Viktor Dukhovni

But Witse, would you agree with me that error 4XX is (in general
cases) a temporary error?

It is a temporary error for *that* recipient. It is not a global
indication that the site is temporary unreachable. Nor is there
any indication how long one should wait, nor that waiting will make
things any better.

Yes you're right. There is no indication of how long we should wait, thats why it would be very nice to have a parameter to determinate that (just like maximal_queue_lifetime)

Post by Viktor Dukhovni
Generally, delaying deliver *increases* congestion, since more mail
arrives in the mean-time, and once delivery resumes the volume is
even higher.

Thats exactly the problem. We have what I call as "mxcluster", which is a box with hundreds of postfix running, splitting the traffic between them. It helps but its not solving the major problem.

Post by Viktor Dukhovni

Why keep trying when we have a clear signal of a temporary error?

Postfix does not "keep trying", it defers the message in question
and moves on to the next one. Your mental model of email queue
management is too naive.
This is a very difficult problem, and there is no simple answer.

Yes it tries the next message. But what about when it is to the same domain and also happens to get deferred?

Post by Viktor Dukhovni

Also, if we had a temporary error control (number of deferred
messages by recipient), it would be easy to identify when postfix
should stop trying at least for a while.

Given an arrival rate of ~50k msgs/day, you need to send at least
1 msg/sec to avoid growing an infinitely large queue. This is basic
arithmetic. Gowing slower does not work, your queue grows without
bound.

Thats why we have multiple instances of postfix running, to split the traffic among them.

Post by Viktor Dukhovni
Let the recipients know that if they want to continue to receive
your email they should choose a new provider that is willing to
work with legitimate senders to resolve mail delivery issues. Then
stop sending them email.

Yes and no. Some SMTPs get higher volumes of mail, but not the entire traffic centralized in only one smtp.

Post by Viktor Dukhovni
--
Viktor.

Rafael

Mark Goodge

2013-01-08 16:44:31 UTC

Post by Rafael Azevedo - IAGENTE

Post by Wietse Venema

Post by Rafael Azevedo - IAGENTE
Why keep trying when we have a clear signal of a temporary
error?

As Victor noted Postfix does not keep trying the SAME delivery.

Yes you're right and I know that. But it keeps trying for another
recipients in the same domain.

Which is absolutely the correct behaviour.

One of the most common reasons for a temporary delivery failure is a
full mailbox. Or, where the remote server is acting as a
store-and-forward, a temporary inability to verify the validity of the
destination address.

I'd be very annoyed if I didn't get an email I was expecting because
someone else on my system had forgotten to empty their mailbox, or
because another customer of my upstream server had an outage and wasn't
able to verify recipients.

Mark

--
Please take a short survey about the Leveson Report: http://meyu.eu/ak

Wietse Venema

2013-01-08 16:48:14 UTC

Post by Rafael Azevedo - IAGENTE

Post by Wietse Venema
Instead, Postfix tries to deliver a DIFFERENT message. It would be
incorrect IN THE GENERAL CASE to postpone ALL deliveries to a site
just because FIVE recipients were unavailable.

Thats why it would be interesting to have a way to configure that.

Configurable, perhaps. But it would a mistake to make this the
default strategy.

That would make Postfix vulnerable to a trivial denial of service
attack where one bad recipient can block all mail for all other
recipients at that same site.

Imagine if I could block all mail for gmail.com in this manner.

If I understand correctly, your proposal is to treat all 4xx and
5xx delivery errors the same as a failure to connect error.

Wietse

Reindl Harald

2013-01-08 16:49:06 UTC

Post by Mark Goodge

Post by Rafael Azevedo - IAGENTE

Post by Wietse Venema

Post by Rafael Azevedo - IAGENTE
Why keep trying when we have a clear signal of a temporary
error?

As Victor noted Postfix does not keep trying the SAME delivery.

Yes you're right and I know that. But it keeps trying for another
recipients in the same domain.

Which is absolutely the correct behaviour.
One of the most common reasons for a temporary delivery failure is a full mailbox. Or, where the remote server is
acting as a store-and-forward, a temporary inability to verify the validity of the destination address.
I'd be very annoyed if I didn't get an email I was expecting because someone else on my system had forgotten to
empty their mailbox, or because another customer of my upstream server had an outage and wasn't able to verify
recipients.

yes that is all right for any "normal" mail

but if you send out a newsletter you have likely a lot of
users to big ISP's, Telekom Austria even rejects temporary
for whitelisted senders

since every smart admin spilts newsletter-relays from the
normal business mail a configuration option would help
while not hurt your case

Reindl Harald

2013-01-08 16:56:41 UTC

Post by Wietse Venema

Post by Rafael Azevedo - IAGENTE

Post by Wietse Venema
Instead, Postfix tries to deliver a DIFFERENT message. It would be
incorrect IN THE GENERAL CASE to postpone ALL deliveries to a site
just because FIVE recipients were unavailable.

Thats why it would be interesting to have a way to configure that.

Configurable, perhaps. But it would a mistake to make this the
default strategy.
That would make Postfix vulnerable to a trivial denial of service
attack where one bad recipient can block all mail for all other
recipients at that same site.
Imagine if I could block all mail for gmail.com in this manner.
If I understand correctly, your proposal is to treat all 4xx and
5xx delivery errors the same as a failure to connect error.

as i understand his proposal is if deliver to a configureable
amount of RCPT's with the same destination server delay also
the following targets with the same destination x minutes
instead trigger another 100,200,300 4xx errors because the
destination does not like mail from your IP for some time

Rafael Azevedo - IAGENTE

2013-01-08 17:01:47 UTC

One of the most common reasons for a temporary delivery failure is a full mailbox. Or, where the remote server is acting as a store-and-forward, a temporary inability to verify the validity of the destination address.

I dont agree with that. Connection time out is the most common reason for temporary failure (in my case).

I'd be very annoyed if I didn't get an email I was expecting because someone else on my system had forgotten to empty their mailbox, or because another customer of my upstream server had an outage and wasn't able to verify recipients.

Mark, I don't think that postfix should stop sending to that domain for ever or that it should send the email back to the sender. I just think that postfix could have a way to hold the mail queue for a specific time based on specific and consecutive errors. Lets say for example, 100 errors in sequence to the same destination domain. Why keep trying if we're unable to deliver to that domain at the time?

Mark
--
Please take a short survey about the Leveson Report: http://meyu.eu/ak

Rafael

Rafael Azevedo - IAGENTE

2013-01-08 17:04:37 UTC

Post by Wietse Venema
Configurable, perhaps. But it would a mistake to make this the
default strategy.
That would make Postfix vulnerable to a trivial denial of service
attack where one bad recipient can block all mail for all other
recipients at that same site.

Not if it could me parametrized. As I said, what if we get 100 errors in sequence? Keep trying to deliver another 10k emails knowing that you're not allowed to send email at this time is more like a DoS attack. We're consuming server's resource when we shouldn't connect to them at all.

Post by Wietse Venema
Imagine if I could block all mail for gmail.com in this manner.
If I understand correctly, your proposal is to treat all 4xx and
5xx delivery errors the same as a failure to connect error.

No thats not what I meant. What I said is that would be nice to have a way to configure specific errors to put the queue on hold for those destinations which we're unable to connect at the time.

Post by Wietse Venema
Wietse

Rafael

Rafael Azevedo - IAGENTE

2013-01-08 17:05:57 UTC

Yes Reindl, you got the point. I just want to wait for a while before retrying to send email to the same destination.

Post by Reindl Harald

Post by Wietse Venema

Post by Rafael Azevedo - IAGENTE

Post by Wietse Venema
Instead, Postfix tries to deliver a DIFFERENT message. It would be
incorrect IN THE GENERAL CASE to postpone ALL deliveries to a site
just because FIVE recipients were unavailable.

Thats why it would be interesting to have a way to configure that.

Configurable, perhaps. But it would a mistake to make this the
default strategy.
That would make Postfix vulnerable to a trivial denial of service
attack where one bad recipient can block all mail for all other
recipients at that same site.
Imagine if I could block all mail for gmail.com in this manner.
If I understand correctly, your proposal is to treat all 4xx and
5xx delivery errors the same as a failure to connect error.

as i understand his proposal is if deliver to a configureable
amount of RCPT's with the same destination server delay also
the following targets with the same destination x minutes
instead trigger another 100,200,300 4xx errors because the
destination does not like mail from your IP for some time

Scott Lambert

2013-01-08 17:49:05 UTC

Post by Rafael Azevedo - IAGENTE

Post by Wietse Venema
Configurable, perhaps. But it would a mistake to make this the
default strategy.
That would make Postfix vulnerable to a trivial denial of service
attack where one bad recipient can block all mail for all other
recipients at that same site.

Not if it could me parametrized. As I said, what if we get 100 errors
in sequence? Keep trying to deliver another 10k emails knowing that
you're not allowed to send email at this time is more like a DoS
attack. We're consuming server's resource when we shouldn't connect to
them at all.

Post by Wietse Venema
Imagine if I could block all mail for gmail.com in this manner.
If I understand correctly, your proposal is to treat all 4xx and
5xx delivery errors the same as a failure to connect error.

No thats not what I meant. What I said is that would be nice to have
a way to configure specific errors to put the queue on hold for those
destinations which we're unable to connect at the time.

Could you not just watch your logs and count temporary errors for
each destination? The script could then reconfigure your mailtertable
to point that domain to a hold transport (or even another box which
is configured to send messages very slowly). After some amount of
time passes, change back to the normal SMTP transport.

I've never needed to do any such thing. But, I believe that would
be possible without depending on changes to Postfix, which may not
be not happen.

--
Scott Lambert KC5MLE Unix SysAdmin
***@lambertfam.org

Wietse Venema

2013-01-08 18:08:21 UTC

Post by Rafael Azevedo - IAGENTE

Post by Wietse Venema
Configurable, perhaps. But it would a mistake to make this the
default strategy.
That would make Postfix vulnerable to a trivial denial of service
attack where one bad recipient can block all mail for all other
recipients at that same site.

Not if it could me parametrized. As I said, what if we get 100
errors in sequence?

Big deal. Now I can block all mail for gmail.com by getting 100
email messages into your queue.

I could add an option to treat this in the same manner as "failure
to connect" errors (i.e. temporarily skip all further delivery to
this site). However, this must not be the default strategy, because
this would hurt the far majority of Postfix sites which is not a
bulk email sender.

Currently, Postfix error processing distinguishes between (hard
versus soft) errors, and between errors (during versus after) the
initial protocol handshake. I don't have time to develop more
detailed error processing strategies, especially not since this is
of no benefit to the majority of the installed base.

Wietse

Reindl Harald

2013-01-08 18:57:40 UTC

Post by Wietse Venema

Post by Rafael Azevedo - IAGENTE

Post by Wietse Venema
Configurable, perhaps. But it would a mistake to make this the
default strategy.
That would make Postfix vulnerable to a trivial denial of service
attack where one bad recipient can block all mail for all other
recipients at that same site.

Not if it could me parametrized. As I said, what if we get 100
errors in sequence?

Big deal. Now I can block all mail for gmail.com by getting 100
email messages into your queue

how comes?
how do you get gmail.com answer to any delivery from you with 4xx?

if it is becasue the 100 messages you are already done because
they reject still your messages and if the otehr side has some
greylisting-like rules to reject messages from you as long there
are not at least 5 minutes without anotehr connection it is even
easier to block you with the current behavior by sending one
message per minute after the treshhold is reached the first time
because you will never go under the 5 minutes

Viktor Dukhovni

2013-01-08 19:12:36 UTC

Post by Wietse Venema
I could add an option to treat this in the same manner as "failure
to connect" errors (i.e. temporarily skip all further delivery to
this site). However, this must not be the default strategy, because
this would hurt the far majority of Postfix sites which is not a
bulk email sender.

Such a feedback mechanism is a sure-fire recipe for congestive
collapse:

- A brief spike in traffic above the steady input rate cases the
message-rate to trigger rate limits at the remote destination.

- Postfix throttles to the destination for many minutes.

- A lot of additional mail arrives while the destination is throttled.

- When the queue is unthrottled, the message rate will immediately
spike above the remote limit. And the queue is throttled.

- Lather, rinse, repeat

The *only* way to deal with rate limits, is to avoid traffic spikes,
which is only possible if you also avoid traffic troughs, and send
at a *steady* rate that is *below* the remote limit, but above the
input rate.

When the steady-state input rate is above the remote limit, no
scheduler strategy can avoid congestive collapse.

For remote sites that enforce indiscriminate rate limits (for all
senders, not just those who have a history of reported spam), the
only strategy is to:

- Send below the rate that triggers the rate-limit

- Buy more machines, and hope that rate limits are per
sending IP, not per sending domain. (showshoe).

- In some cases, the rate limits are reputation dependent,
and it takes time to "build" a good reputation. In that case
one needs to ramp traffic to the domain over time, by generating
less mail intially, and building volume over time. This is done
outside of the MTA, in the mail generating engine.

Thinking that delaying sending is a good idea is dead wrong. The
optimal strategy is to always send as fast as possible and no
faster!

--
Viktor.

Wietse Venema

2013-01-08 19:16:44 UTC

Post by Reindl Harald

Post by Wietse Venema
Big deal. Now I can block all mail for gmail.com by getting 100
email messages into your queue

how comes?
how do you get gmail.com answer to any delivery from you with 4xx?

He wants to temporarily suspend delivery when site has 5 consecutive
delivery errors without distinguishing between SMTP protocol stages
(such an "ignore protocol stage" switch could be added to Postfix).

To implement a trivial DOS, I need 5 consecutive messages in his
mail queue, plus a handful accounts that don't accept mail.

I have no idea where he got the 100 from - that number was not part
of his original problem description.

Wietse

Reindl Harald

2013-01-08 19:25:45 UTC

Post by Wietse Venema

Post by Reindl Harald

Post by Wietse Venema
Big deal. Now I can block all mail for gmail.com by getting 100
email messages into your queue

how comes?
how do you get gmail.com answer to any delivery from you with 4xx?

He wants to temporarily suspend delivery when site has 5 consecutive
delivery errors without distinguishing between SMTP protocol stages
(such an "ignore protocol stage" switch could be added to Postfix).
To implement a trivial DOS, I need 5 consecutive messages in his
mail queue, plus a handful accounts that don't accept mail.
I have no idea where he got the 100 from - that number was not part
of his original problem description

the 100 was out in the game by yourself :-)

however, it would make sense to delay the next try to as example
@aon.at after 10,20,30 4xx errors while send out a newsletter
and re-start try to deliver this messages 5,10,15 minutes later

on the other hand: if you do not see snse here and are not willing
to spend time in this idea -> postfix is your baby and still one
of the greatest pieces of software i was allowed to use in many
years and we will not die if you reject the feature wish

Wietse Venema

2013-01-08 19:39:17 UTC

Post by Viktor Dukhovni

Post by Wietse Venema
I could add an option to treat this in the same manner as "failure
to connect" errors (i.e. temporarily skip all further delivery to
this site). However, this must not be the default strategy, because
this would hurt the far majority of Postfix sites which is not a
bulk email sender.

Such a feedback mechanism is a sure-fire recipe for congestive

That depends on their average mail input rate. As long as they can
push out the mail from one input burst before the next input burst
happens, then it may be OK that the output flow stutters sometimes.

Wietse

Viktor Dukhovni

2013-01-08 19:51:44 UTC

Post by Wietse Venema

Post by Viktor Dukhovni

Post by Wietse Venema
I could add an option to treat this in the same manner as "failure
to connect" errors (i.e. temporarily skip all further delivery to
this site). However, this must not be the default strategy, because
this would hurt the far majority of Postfix sites which is not a
bulk email sender.

Such a feedback mechanism is a sure-fire recipe for congestive

That depends on their average mail input rate. As long as they can
push out the mail from one input burst before the next input burst
happens, then it may be OK that the output flow stutters sometimes.

This is most unlikely. The sample size before the remote side clamps
down is likely small, so the effective throughput per throttle
interval will be very low.

If Postfix backs off initially for 5 minutes, it will fully drain
the active queue to deferred, then get a handfull of messages
through, then backoff for 10 minutes (doubling each time up to the
maximal_backoff_time). This won't push out 50k messages/day.

The optimal strategy is too send each message as quickly as possible,
but not faster than the remote rate limit, i.e. tune the rate delay.
Perhaps we need to measure the rate delay in tenths of seconds for
a bit more flexibility.

One can imagine adding a feedback mechanism to the rate delay (with
fractional positive/negative feedback), but getting a stable
algorithm out of this is far from easy.

Throttling the active queue is not an answer. With rate limits, one
wants to slow down, not stop, but throttling is not "slowing down".

Barring a clean "slow down" signal, and a stable feedback mechanism,
the only strategy is manually tuned rate delays, and spreading the
load over multiple sending IPs (Postfix instances don't help if
they share a single IP).

--
Viktor.

Reindl Harald

2013-01-08 20:17:05 UTC

Post by Viktor Dukhovni

Post by Wietse Venema

Post by Viktor Dukhovni

Post by Wietse Venema
I could add an option to treat this in the same manner as "failure
to connect" errors (i.e. temporarily skip all further delivery to
this site). However, this must not be the default strategy, because
this would hurt the far majority of Postfix sites which is not a
bulk email sender.

Such a feedback mechanism is a sure-fire recipe for congestive

That depends on their average mail input rate. As long as they can
push out the mail from one input burst before the next input burst
happens, then it may be OK that the output flow stutters sometimes.

This is most unlikely. The sample size before the remote side clamps
down is likely small, so the effective throughput per throttle
interval will be very low.
If Postfix backs off initially for 5 minutes, it will fully drain
the active queue to deferred, then get a handfull of messages
through, then backoff for 10 minutes (doubling each time up to the
maximal_backoff_time). This won't push out 50k messages/day

you missed the PER DESTINATION

* not "initially"
* not globally
* after CONFIGUREABLE temporary messages to the same destination

on a dedicated MTA for newsletters it would even improve in many
cass the amount of messages per day and your whole rputation
of the ip-address you are sending from

on a NORMAL mailserver with human senders it would make no sense

thats why such thing must not be default but would be nice to have

Wietse Venema

2013-01-08 20:40:57 UTC

Post by Viktor Dukhovni

Post by Wietse Venema
I could add an option to treat this in the same manner as "failure
to connect" errors (i.e. temporarily skip all further delivery to
this site). However, this must not be the default strategy, because
this would hurt the far majority of Postfix sites which is not a
bulk email sender.

Such a feedback mechanism is a sure-fire recipe for congestive

Given that sites will tempfail a delivery attempt to signal a "stay
away" condition, it makes sense to provide an option for bulkmailers
to treat those responses accordingly. That does not mean that this
option will be sufficient to solve all mail delivery problems. Some
parts will require human intervention.

The "stay away" condition is similar to "failure to connect", except
that one might want to use different timers. With "failure to
connect" the timer choice depends more on sender preferences, while
with "failure to deliver" the timer choice would depend more on the
destination.

Coming to the issue of receiver's limits: when receivers slam hard
on the brakes, I don't see how Postfix could automatically tune the
delivery rate (as a fictitious example, suppose that a transgression
results in a 30-minute penalty during which no mail will be accepted
from the client IP address).

My conclusion is that Postfix can continue to provide basic policies
that avoid worst-case failure modes, but the choice of the settings
that control those policies is better left to the operator. If the
receiver slams on the brakes, then Postfix can suspend deliveries,
but the sender operator will have to adjust the sending rate.

Wietse

Reindl Harald

2013-01-08 21:02:31 UTC

Post by Wietse Venema
My conclusion is that Postfix can continue to provide basic policies
that avoid worst-case failure modes, but the choice of the settings
that control those policies is better left to the operator. If the
receiver slams on the brakes, then Postfix can suspend deliveries,
but the sender operator will have to adjust the sending rate.

exactly this is the point

thank you for your understanding and thoughts!

Viktor Dukhovni

2013-01-09 01:57:10 UTC

Post by Reindl Harald

Post by Wietse Venema
My conclusion is that Postfix can continue to provide basic policies
that avoid worst-case failure modes, but the choice of the settings
that control those policies is better left to the operator. If the
receiver slams on the brakes, then Postfix can suspend deliveries,
but the sender operator will have to adjust the sending rate.

exactly this is the point
thank you for your understanding and thoughts!

Suspending delivery and punting all messages from the active queue
for the designated nexthop is not a winning strategy. In this state
mail delivery to the destination is in most cases unlikely to
recover without manual intervention.

I would posit that neither Reindl nor the OP, or that many others
really understand what they are asking for. If they understood,
they would stop asking for it.

When faced with a destination that imposes tight rate limits you
must pre-configure your MTA to always stay under the limits. Nothing
good happens when the Postfix output rate under load exceeds the
remote limit whether you throttle the queue repeatedly or not.

The best that one can hope for is for Postfix to dynamically apply
a rate delay that is guaranteed to be slow enough to get under the
limit, and then gradually reduce it.

Throttling the destination (moving all active mail to deferred)
is a pre-programmed MTA outage, I'd not want to operate any system
that behaves that way, and neither should you, whether you know
it or not.

--
Viktor.

Reindl Harald

2013-01-09 02:06:58 UTC

Post by Viktor Dukhovni

Post by Reindl Harald

Post by Wietse Venema
My conclusion is that Postfix can continue to provide basic policies
that avoid worst-case failure modes, but the choice of the settings
that control those policies is better left to the operator. If the
receiver slams on the brakes, then Postfix can suspend deliveries,
but the sender operator will have to adjust the sending rate.

exactly this is the point
thank you for your understanding and thoughts!

Suspending delivery and punting all messages from the active queue
for the designated nexthop is not a winning strategy. In this state
mail delivery to the destination is in most cases unlikely to
recover without manual intervention.

it is in the usecase of a DEDICATED newsletter relay
why should it not recover?

the request was "after 20 temp fails to the same destination
retry the next delivers to THIS destination FIVE MINUTES later"

Post by Viktor Dukhovni
I would posit that neither Reindl nor the OP, or that many others
really understand what they are asking for. If they understood,
they would stop asking for it.

i would posit you do not understand the usecase

Post by Viktor Dukhovni
When faced with a destination that imposes tight rate limits you
must pre-configure your MTA to always stay under the limits. Nothing
good happens when the Postfix output rate under load exceeds the
remote limit whether you throttle the queue repeatedly or not

smtp_destination_recipient_limit = 15
smtp_initial_destination_concurrency = 2
smtp_destination_concurrency_limit = 2
smtp_destination_concurrency_failed_cohort_limit = 5
smtp_destination_rate_delay = 1

so what should one do more?
the sending machine is whitelisted at the ISP
but the whitelisting does not affect rate-limits

and yes we do not care if a newsletter has reached every RCPT
two hours later but we do care for reputation and not exceed
rate limits of large ISP's

Viktor Dukhovni

2013-01-09 02:17:33 UTC

Post by Reindl Harald

Post by Viktor Dukhovni
Suspending delivery and punting all messages from the active queue
for the designated nexthop is not a winning strategy. In this state
mail delivery to the destination is in most cases unlikely to
recover without manual intervention.

it is in the usecase of a DEDICATED newsletter relay
why should it not recover?
the request was "after 20 temp fails to the same destination
retry the next delivers to THIS destination FIVE MINUTES later"

That's not what happens when a destination is throttled, all mail
there is deferred, and is retried some indefinite time later that
is at least 5 minutes but perhaps a lot longer, and at great I/O
cost, with expontial backoff for each message based on time in the
queue, ...

To understand what one is asking for, one needs to understand the
scheduler (qmgr) architecture. Otherwise, one is just babbling
nonsense (no offense intended).

Post by Reindl Harald

Post by Viktor Dukhovni
I would posit that neither Reindl nor the OP, or that many others
really understand what they are asking for. If they understood,
they would stop asking for it.

i would posit you do not understand the usecase

How likely do you think that is? Of course I understand the use
case, in fact better than the users who are asking for it.

Post by Reindl Harald
and yes we do not care if a newsletter has reached every RCPT
two hours later but we do care for reputation and not exceed
rate limits of large ISP's

Throttling the destination (which means moving all pending messages
for the destinatin to deferred, where they age exponentially, while
more mail builds up...) is not the answer to your problem.

1. Get whitelisted without limits, send at the arrival rate.
2. Get whitelisted at above the arrival rate, set rate delay to
avoid exceeding the rate.
3. Don't waste time with unresponsive mailbox providers, tell their
customers their mailbox provider is not supported.
4. Snowshoe.

Pick the first one that is viable for you.

--
Viktor.

Reindl Harald

2013-01-09 02:27:57 UTC

Post by Viktor Dukhovni

Post by Reindl Harald
the request was "after 20 temp fails to the same destination
retry the next delivers to THIS destination FIVE MINUTES later"

That's not what happens when a destination is throttled, all mail
there is deferred, and is retried some indefinite time later that
is at least 5 minutes but perhaps a lot longer, and at great I/O
cost, with expontial backoff for each message based on time in the
queue, ...
To understand what one is asking for, one needs to understand the
scheduler (qmgr) architecture. Otherwise, one is just babbling
nonsense (no offense intended).

and the request was if the behavior can be controlled in
the future and not was the behavior currently is

Post by Viktor Dukhovni
Throttling the destination (which means moving all pending messages
for the destinatin to deferred, where they age exponentially, while
more mail builds up...) is not the answer to your problem.

sorry, but you really NOT understand the usecase

"while more mail builds up"
NO there is NO MORE MAIL built up

* DEDICATED NEWSLETTER MACHINE
* means large amount of mails one or two times a week

Post by Viktor Dukhovni
1. Get whitelisted without limits, send at the arrival rate

no option

Post by Viktor Dukhovni
2. Get whitelisted at above the arrival rate, set rate delay to
avoid exceeding the rate

you missed

smtp_destination_recipient_limit = 15
smtp_initial_destination_concurrency = 2
smtp_destination_concurrency_limit = 2
smtp_destination_concurrency_failed_cohort_limit = 10
smtp_destination_rate_delay = 1

Post by Viktor Dukhovni
3. Don't waste time with unresponsive mailbox providers, tell their
customers their mailbox provider is not supported.

reailty check: you propose to tell my customers that
they should tell their customers anything because the
mailadmin would like to get rid of the permamently
"try again later" messages in his maillog

this will not happen in the real world

Rafael Azevedo - IAGENTE

2013-01-09 11:42:15 UTC

Post by Wietse Venema

Post by Reindl Harald

Post by Wietse Venema
Big deal. Now I can block all mail for gmail.com by getting 100
email messages into your queue

how comes?
how do you get gmail.com answer to any delivery from you with 4xx?

He wants to temporarily suspend delivery when site has 5 consecutive
delivery errors without distinguishing between SMTP protocol stages
(such an "ignore protocol stage" switch could be added to Postfix).
To implement a trivial DOS, I need 5 consecutive messages in his
mail queue, plus a handful accounts that don't accept mail.
I have no idea where he got the 100 from - that number was not part
of his original problem description.

Wietse, I've never said anything about 5 errors, you suggested that on the cohort parameter.

When delivery starts failing because of an active block, its impossible to deliver any email after that. So it might happened to have like 10k emails to the same destination (domain).

As I said, if this happens to be parametrized, we could set 100, 500, 1000 errors, whatever fits the needs.

Post by Wietse Venema
Wietse

Rafael

Rafael Azevedo - IAGENTE

2013-01-09 11:46:34 UTC

Post by Viktor Dukhovni
Barring a clean "slow down" signal, and a stable feedback mechanism,
the only strategy is manually tuned rate delays, and spreading the
load over multiple sending IPs (Postfix instances don't help if
they share a single IP).

I have multiple instances of Postfix running on multiple IPs. The problem (not quite sure if it is a problem) is that we don't have a shared queue, so each postfix (ip) has its own queue. Is there anyway to share the queue along multiple postfix (ips) instances? Does it make sense?

Rafael

Rafael Azevedo - IAGENTE

2013-01-09 11:51:22 UTC

I agree with Reindl, I guess Witsie is now better understanding the problem here.

I'd see this as a "additional feature", not default configuration.

It would be even better if that could be parametrized on named transport basis.

- Rafael

Post by Reindl Harald

Post by Wietse Venema
My conclusion is that Postfix can continue to provide basic policies
that avoid worst-case failure modes, but the choice of the settings
that control those policies is better left to the operator. If the
receiver slams on the brakes, then Postfix can suspend deliveries,
but the sender operator will have to adjust the sending rate.

exactly this is the point
thank you for your understanding and thoughts!

Rafael Azevedo - IAGENTE

2013-01-09 11:54:19 UTC

Post by Viktor Dukhovni
When faced with a destination that imposes tight rate limits you
must pre-configure your MTA to always stay under the limits. Nothing
good happens when the Postfix output rate under load exceeds the
remote limit whether you throttle the queue repeatedly or not.

But many times we just don't know the other side's limits and watching logs every day searching for delivery failures, with all the respect, is very painful.

Post by Viktor Dukhovni
The best that one can hope for is for Postfix to dynamically apply
a rate delay that is guaranteed to be slow enough to get under the
limit, and then gradually reduce it.

That would be very nice.

- Rafael

Rafael Azevedo - IAGENTE

2013-01-09 12:02:02 UTC

Post by Viktor Dukhovni
That's not what happens when a destination is throttled, all mail
there is deferred, and is retried some indefinite time later that
is at least 5 minutes but perhaps a lot longer, and at great I/O
cost, with expontial backoff for each message based on time in the
queue, …

I totally disagree with you. It would have more I/O having postfix trying to deliver when there's an active block. Messages are kept in disk/memory for much longer time, and Postfix keeps putting it from deferred to active and then back to deferred again. Plus, you can't forget that there are more messages coming in the mean time while the block is still active, leading postfix to a **infinite** loop (until it reaches maximal_queue_lifetime)

Post by Viktor Dukhovni
To understand what one is asking for, one needs to understand the
scheduler (qmgr) architecture. Otherwise, one is just babbling
nonsense (no offense intended).

Where can I read more about this?

Post by Viktor Dukhovni

Post by Reindl Harald

Post by Viktor Dukhovni
I would posit that neither Reindl nor the OP, or that many others
really understand what they are asking for. If they understood,
they would stop asking for it.

i would posit you do not understand the usecase

How likely do you think that is? Of course I understand the use
case, in fact better than the users who are asking for it.

Sorry Viktor, but I'm not sure about that. You keep saying to get whitelist like if it would be very easy. Believe me, there are a lot of companies that don't have any support for that.

Post by Viktor Dukhovni

Post by Reindl Harald
and yes we do not care if a newsletter has reached every RCPT
two hours later but we do care for reputation and not exceed
rate limits of large ISP's

Throttling the destination (which means moving all pending messages
for the destinatin to deferred, where they age exponentially, while
more mail builds up...) is not the answer to your problem.

But why move all the "active" queue to deferred? Wouldn't it be better to just move it to hold queue?

Post by Viktor Dukhovni
1. Get whitelisted without limits, send at the arrival rate.
2. Get whitelisted at above the arrival rate, set rate delay to
avoid exceeding the rate.
3. Don't waste time with unresponsive mailbox providers, tell their
customers their mailbox provider is not supported.
4. Snowshoe.

What's the meaning of "Snowshoe"?

- Rafael

Michael P. Demelbauer

2013-01-09 14:03:43 UTC

On Wed, Jan 09, 2013 at 10:02:02AM -0200, Rafael Azevedo - IAGENTE wrote:
[ ... ]

Post by Rafael Azevedo - IAGENTE

Post by Viktor Dukhovni
To understand what one is asking for, one needs to understand the
scheduler (qmgr) architecture. Otherwise, one is just babbling
nonsense (no offense intended).

Where can I read more about this?

I think

http://www.postfix.org/SCHEDULER_README.html

was already mentioned in that thread?!

I also don't know anything else.
--
Michael P. Demelbauer
Systemadministration
WSR
Arsenal, Objekt 20
1030 Wien
-------------------------------------------------------------------------------
In Deutschland geniesst der Zuschauer eine Freizeitgestaltung auf hohem
Niveau, bei uns hat man gefrorene Finger und pickt mit einem kalten
Hintern auf der Stahltribüne.
-- Helmut Kraft auf die Frage, was getan werden müsste, damit Ö's
Fussball wieder attraktiver wird.

Viktor Dukhovni

2013-01-09 14:58:00 UTC

Post by Rafael Azevedo - IAGENTE

Post by Viktor Dukhovni
That's not what happens when a destination is throttled, all mail
there is deferred, and is retried some indefinite time later that
is at least 5 minutes but perhaps a lot longer, and at great I/O
cost, with expontial backoff for each message based on time in the
queue, ?

I totally disagree with you. It would have more I/O having postfix
trying to deliver when there's an active block. Messages are kept
in disk/memory for much longer time, and Postfix keeps putting it
from deferred to active and then back to deferred again. Plus, you
can't forget that there are more messages coming in the mean time
while the block is still active, leading postfix to a **infinite**
loop (until it reaches maximal_queue_lifetime)

The part you're missing is that when Postfix "stops sending" the
only mechanism in place other than a rate delay is "throttling"
the destination queue, which moves every message (even those that
have not been tried yet), from "active" to "deferred", and the
same happens to any message that happens to arrive while the
queue is throttled.

The delivery rate to the destination will be a small number of
messages per $maximal_backoff_time, this is not terribly useful,
with the entire backlog shuffling between the active and deferred
queues, without any useful work being done.

Post by Rafael Azevedo - IAGENTE

Post by Viktor Dukhovni
To understand what one is asking for, one needs to understand the
scheduler (qmgr) architecture. Otherwise, one is just babbling
nonsense (no offense intended).

Where can I read more about this?

SCHEDULE_README, the queue manager man page and source code.

Post by Rafael Azevedo - IAGENTE

Post by Viktor Dukhovni

Post by Reindl Harald
i would posit you do not understand the usecase

How likely do you think that is? Of course I understand the use
case, in fact better than the users who are asking for it.

Sorry Viktor, but I'm not sure about that. You keep saying to
get whitelist like if it would be very easy. Believe me, there are
a lot of companies that don't have any support for that.

I listed all the options available to you, one which "whitelisting"
is always the best when possible. When not possible you try one
of the others.

Post by Rafael Azevedo - IAGENTE

Post by Viktor Dukhovni
Throttling the destination (which means moving all pending messages
for the destinatin to deferred, where they age exponentially, while
more mail builds up...) is not the answer to your problem.

But why move all the "active" queue to deferred? Wouldn't it be
better to just move it to hold queue?

This suspends delivery of a (multi-recipient) message for all
deferred destinations, not just the ones you want treated specially.

Messages on hold don't get retried without manual intervention, that
makes your delivery rate to the destination zero, not a good way
to get the mail out.

Post by Rafael Azevedo - IAGENTE

Post by Viktor Dukhovni
1. Get whitelisted without limits, send at the arrival rate.
2. Get whitelisted at above the arrival rate, set rate delay to
avoid exceeding the rate.
3. Don't waste time with unresponsive mailbox providers, tell their
customers their mailbox provider is not supported.
4. Snowshoe.

What's the meaning of "Snowshoe"?

Spread the load over sufficiently many outbound systems that the
rate limits are not exceeded by any of them. A fifth option is
to outsource to others who've already done that.

--
Viktor.

Rafael Azevedo - IAGENTE

2013-01-09 15:29:06 UTC

I was watching my log files now looking for deferred errors, and for my surprise, we got temporary blocked by Yahoo on some SMTPs (ips), as shown:

Jan 9 13:20:52 mxcluster yahoo/smtp[8593]: 6731A13A2D956: host mta5.am0.yahoodns.net[98.136.216.25] refused to talk to me: 421 4.7.0 [TS02] Messages from X.X.X.X temporarily deferred - 4.16.56.1; see http://postmaster.yahoo.com/errors/421-ts02.html

So guess what, I still have another 44k messages on active queue (a lot of them are probably to yahoo) and postfix is wasting its time and cpu trying to deliver to Yahoo when there's an active block.

Yahoo suggests to try delivering in few hours, but we'll never get rid from the block if we keep trying while the block is active.

This doesn't happens only with bulk senders. Many people use their hosting company to send few hundreds emails together with many other users sending legitimate mails from their mail clients Eventually, one user will compromise all infrastructure and many people may have problem delivering their messages.

There's gotta be a solution for this.

- Rafael

John Peach

2013-01-09 15:31:45 UTC

On Wed, 9 Jan 2013 13:29:06 -0200

Post by Rafael Azevedo - IAGENTE
I was watching my log files now looking for deferred errors, and for
my surprise, we got temporary blocked by Yahoo on some SMTPs (ips),
Jan 9 13:20:52 mxcluster yahoo/smtp[8593]: 6731A13A2D956: host
mta5.am0.yahoodns.net[98.136.216.25] refused to talk to me: 421 4.7.0
[TS02] Messages from X.X.X.X temporarily deferred - 4.16.56.1; see
http://postmaster.yahoo.com/errors/421-ts02.html
So guess what, I still have another 44k messages on active queue (a
lot of them are probably to yahoo) and postfix is wasting its time
and cpu trying to deliver to Yahoo when there's an active block.
Yahoo suggests to try delivering in few hours, but we'll never get
rid from the block if we keep trying while the block is active.
This doesn't happens only with bulk senders. Many people use their
hosting company to send few hundreds emails together with many other
users sending legitimate mails from their mail clients… Eventually,
one user will compromise all infrastructure and many people may have
problem delivering their messages.
There's gotta be a solution for this.

There is - you need to register your mailserver(s) with yahoo.

Post by Rafael Azevedo - IAGENTE
- Rafael

Rafael Azevedo - IAGENTE

2013-01-09 15:37:00 UTC

Post by John Peach

Post by Rafael Azevedo - IAGENTE
There's gotta be a solution for this.

There is - you need to register your mailserver(s) with yahoo

You mean Yahoo's Feedback Program (feedbackloop.yahoo.net) ?

- Rafael

John Peach

2013-01-09 15:45:59 UTC

On Wed, 9 Jan 2013 13:37:00 -0200

Post by Rafael Azevedo - IAGENTE

Post by John Peach

Post by Rafael Azevedo - IAGENTE
There's gotta be a solution for this.

There is - you need to register your mailserver(s) with yahoo

You mean Yahoo's Feedback Program (feedbackloop.yahoo.net) ?

I forget exactly what needs doing, but you definitely need DKIM records
and to register with their feedbackloop:

http://help.yahoo.com/kb/index?page=content&y=PROD_MAIL_ML&locale=en_US&id=SLN3435&impressions=true

Post by Rafael Azevedo - IAGENTE
- Rafael

Viktor Dukhovni

2013-01-09 15:48:16 UTC

Post by Rafael Azevedo - IAGENTE
I was watching my log files now looking for deferred errors, and
for my surprise, we got temporary blocked by Yahoo on some SMTPs
Jan 9 13:20:52 mxcluster yahoo/smtp[8593]: 6731A13A2D956: host mta5.am0.yahoodns.net[98.136.216.25] refused to talk to me: 421 4.7.0 [TS02] Messages from X.X.X.X temporarily deferred - 4.16.56.1; see http://postmaster.yahoo.com/errors/421-ts02.html

Postfix already treats this as a don't send signal. Enough of these
back to back and transmission stops. This is a 421 during HELO,
not a 4XX during RCPT TO.

Yahoo's filters are NOT simple rate limits. They delay delivery when
their reputation system wants more time to assess the source. They
typically will permit delayed message when they're retried, unless
of course they believe the source to be spamming, in which case they
may reject, or quarantine...

Post by Rafael Azevedo - IAGENTE
So guess what, I still have another 44k messages on active queue
(a lot of them are probably to yahoo) and postfix is wasting its
time and cpu trying to deliver to Yahoo when there's an active
block.
Yahoo suggests to try delivering in few hours, but we'll never
get rid from the block if we keep trying while the block is active.

This is false. Postfix does not "keep trying" under the above
conditions, and Yahoo does not rate-limit in the naive manner you
imagine.

Post by Rafael Azevedo - IAGENTE
This doesn't happens only with bulk senders. Many people use
their hosting company to send few hundreds emails together with
many other users sending legitimate mails from their mail clients?
Eventually, one user will compromise all infrastructure and many
people may have problem delivering their messages.

This is rarely a problem, and when it is, any blocking is usually
transient, and one can request to be unblocked, at most providers.

Post by Rafael Azevedo - IAGENTE
There's gotta be a solution for this.

Yes, but not the one you're asking for. It is I think possible to
design and implement a useful dynamic rate delay algorithm, I am
not sure that spending the effort to optimize Postfix for unwhitelisted
bulk email is a good use of developer effort.

--
Viktor.

Wietse Venema

2013-01-09 16:20:48 UTC

Post by Wietse Venema
My conclusion is that Postfix can continue to provide basic policies
that avoid worst-case failure modes, but the choice of the settings
that control those policies is better left to the operator. If the
receiver slams on the brakes, then Postfix can suspend deliveries,
but the sender operator will have to adjust the sending rate.
I agree with Reindl, I guess Witsie is now better understanding
the problem here.

Please take the effort to spell my name correctly.

When a site sends a small volume of mail, the existing Postfix
strategy is sufficient (skip a site after N connect/handshake errors,
don't treat a post-handshake error as a "stay away" signal). The
email will eventually get through.

When a site sends a large volume of mail to a rate-limited destination,
'we" believe that a stragegy based on a bursty send-suspend cycle
will perform worse than a strategy based on an uninterrupted flow.

Why does this difference matter? Once the sending rate drops under
rate at which mail enters the mail queue, all strategies become
equivalent to throwing away mail.

This is why bulk mailers should use a strategy based on an uninterrupted
flow, instead of relying on a bursty send-suspend cycle.

This is consistent with my conclusion cited above. The sole benefit
of adding the switch is that when it trips, the operator knows they
need a different sending strategy (reduce rates, snowshoe, whatever).

Wietse

Rafael Azevedo - IAGENTE

2013-01-09 16:20:50 UTC

John,

We've already done that.
We do sign ALL messages with DKIM and are also subscribed for Yahoo Feedback Loop Program.

Still there are few messages being blocked based on users complaints or "unusual traffic from the IP xxx"…

- Rafael

Post by John Peach
On Wed, 9 Jan 2013 13:37:00 -0200

Post by Rafael Azevedo - IAGENTE

Post by John Peach

Post by Rafael Azevedo - IAGENTE
There's gotta be a solution for this.

There is - you need to register your mailserver(s) with yahoo

You mean Yahoo's Feedback Program (feedbackloop.yahoo.net) ?

I forget exactly what needs doing, but you definitely need DKIM records
http://help.yahoo.com/kb/index?page=content&y=PROD_MAIL_ML&locale=en_US&id=SLN3435&impressions=true

Post by Rafael Azevedo - IAGENTE
- Rafael

Rafael Azevedo - IAGENTE

2013-01-09 16:29:21 UTC

Post by Viktor Dukhovni

Post by Rafael Azevedo - IAGENTE
I was watching my log files now looking for deferred errors, and
for my surprise, we got temporary blocked by Yahoo on some SMTPs
Jan 9 13:20:52 mxcluster yahoo/smtp[8593]: 6731A13A2D956: host mta5.am0.yahoodns.net[98.136.216.25] refused to talk to me: 421 4.7.0 [TS02] Messages from X.X.X.X temporarily deferred - 4.16.56.1; see http://postmaster.yahoo.com/errors/421-ts02.html

Postfix already treats this as a don't send signal. Enough of these
back to back and transmission stops. This is a 421 during HELO,
not a 4XX during RCPT TO.

So please, tell me what am I doing wrong because my postfix servers keep trying even after this failure. At this moment I have over 30k emails to yahoo on deferred queue based on the same error.

Post by Viktor Dukhovni
Yahoo's filters are NOT simple rate limits. They delay delivery when
their reputation system wants more time to assess the source. They
typically will permit delayed message when they're retried, unless
of course they believe the source to be spamming, in which case they
may reject, or quarantine…

I agree with that.

Post by Viktor Dukhovni

Post by Rafael Azevedo - IAGENTE
So guess what, I still have another 44k messages on active queue
(a lot of them are probably to yahoo) and postfix is wasting its
time and cpu trying to deliver to Yahoo when there's an active
block.
Yahoo suggests to try delivering in few hours, but we'll never
get rid from the block if we keep trying while the block is active.

This is false. Postfix does not "keep trying" under the above
conditions, and Yahoo does not rate-limit in the naive manner you
imagine.

My postfix does keep trying. Any idea about why this is happening?

Post by Viktor Dukhovni

Post by Rafael Azevedo - IAGENTE
This doesn't happens only with bulk senders. Many people use
their hosting company to send few hundreds emails together with
many other users sending legitimate mails from their mail clients?
Eventually, one user will compromise all infrastructure and many
people may have problem delivering their messages.

This is rarely a problem, and when it is, any blocking is usually
transient, and one can request to be unblocked, at most providers.

"Most" in this case might not be enough.

Post by Viktor Dukhovni

Post by Rafael Azevedo - IAGENTE
There's gotta be a solution for this.

Yes, but not the one you're asking for. It is I think possible to
design and implement a useful dynamic rate delay algorithm, I am
not sure that spending the effort to optimize Postfix for unwhitelisted
bulk email is a good use of developer effort.

I'm 100% sure that this doesn't happened only with bulk senders. Legitimate mails are also subject to be blocked because of bad emails.

Last week a customer's server got compromised, somebody uploaded a bulk-php-script that started sending thousands of emails in a very small time frame, blocking all legitimate emails from that time on up to few hours.

- Rafael

Wietse Venema

2013-01-09 16:30:47 UTC

Post by Rafael Azevedo - IAGENTE
I was watching my log files now looking for deferred errors, and
for my surprise, we got temporary blocked by Yahoo on some SMTPs
Jan 9 13:20:52 mxcluster yahoo/smtp[8593]: 6731A13A2D956: host
mta5.am0.yahoodns.net[98.136.216.25] refused to talk to me: 421
4.7.0 [TS02] Messages from X.X.X.X temporarily deferred - 4.16.56.1;
see http://postmaster.yahoo.com/errors/421-ts02.html

As required by RFC the Postfix SMTP client will try another MX
host when it receives a 4xx greeting.

Postfix limits the number of MX hosts to try.

When all greetings fail with 4xx or whatever then Postfix will
suspend deliveries.

Therefore, you are talking out of your exhaust pipe when you
claim that Postfix keeps trying.

Wietse

Rafael Azevedo - IAGENTE

2013-01-09 16:39:15 UTC

Post by Wietse Venema

Post by Rafael Azevedo - IAGENTE
I agree with Reindl, I guess Witsie is now better understanding
the problem here.

Please take the effort to spell my name correctly.

Sorry about that Wietse. It was a typo mistake. I didn't intend to offend you.

Post by Wietse Venema
When a site sends a small volume of mail, the existing Postfix
strategy is sufficient (skip a site after N connect/handshake errors,
don't treat a post-handshake error as a "stay away" signal). The
email will eventually get through.
When a site sends a large volume of mail to a rate-limited destination,
'we" believe that a stragegy based on a bursty send-suspend cycle
will perform worse than a strategy based on an uninterrupted flow.

This will eventually block the sending server (ip), also helping to reduce the IP reputation because in average, the mail server is not sending emails all the time, then start suddenly sending huge volume makes it easy for servers to identify a spam source, when its not.

Post by Wietse Venema
Why does this difference matter? Once the sending rate drops under
rate at which mail enters the mail queue, all strategies become
equivalent to throwing away mail.

I'm trying to understand what you said but it doesn't make any sense to me.
Today, when I have a huge deferred queue, I just put them all on hold and about 4 to 6 hours later, release them all back to active queue and guess what, we get many messages to be sent successfully. The main problem on this "solution" (if we can say that) is when there are still coming new messages to be sent, which does not give the IP enough time to "breathe".

Post by Wietse Venema
This is why bulk mailers should use a strategy based on an uninterrupted
flow, instead of relying on a bursty send-suspend cycle.

My experience tells me that this will help to get blocked very easily.

- Rafael

Rafael Azevedo - IAGENTE

2013-01-09 16:44:17 UTC

Post by Wietse Venema

Post by Rafael Azevedo - IAGENTE
I was watching my log files now looking for deferred errors, and
for my surprise, we got temporary blocked by Yahoo on some SMTPs
Jan 9 13:20:52 mxcluster yahoo/smtp[8593]: 6731A13A2D956: host
mta5.am0.yahoodns.net[98.136.216.25] refused to talk to me: 421
4.7.0 [TS02] Messages from X.X.X.X temporarily deferred - 4.16.56.1;
see http://postmaster.yahoo.com/errors/421-ts02.html

As required by RFC the Postfix SMTP client will try another MX
host when it receives a 4xx greeting.
Postfix limits the number of MX hosts to try.
When all greetings fail with 4xx or whatever then Postfix will
suspend deliveries.

I have no idea about what I'm doing wrong, this really doesn't happen in my servers.

In my case, postfix keeps trying to deliver even the messages on active queue to the same destination, not mentioning the deferred queue that never stops sending after getting this error. I mean, postfix gets the next message (not considering the destination) and tries again and again.

Post by Wietse Venema
Therefore, you are talking out of your exhaust pipe when you
claim that Postfix keeps trying.

Sorry, my english is not that good, I didn't understand what you mean.

- Rafael

Rafael Azevedo - IAGENTE

2013-01-09 16:48:27 UTC

Now Yahoo is giving another response:

said: 451 Message temporarily deferred - [160] (in reply to end of DATA command)

See, this is very hard to solve. I'm really truing to better understand the problem in order to find out the best solution. I'd like to thank in advance for the help, its being very appreciated.

- Rafael

Wietse Venema

2013-01-09 16:52:17 UTC

Post by Rafael Azevedo - IAGENTE

Post by Wietse Venema
Why does this difference matter? Once the sending rate drops under
rate at which mail enters the mail queue, all strategies become
equivalent to throwing away mail.

I'm trying to understand what you said but it doesn't make any sense to me.

When you can send N messages per day, and your queue receives M>N
messages per day, then you will throw away M-N messages per day.

If you achieve the sending rate by driving at full speed into the
wall and waiting for 6 hours, then your N will be smaller, and
you will throw away more mail.

Wietse

Wietse Venema

2013-01-09 16:54:23 UTC

Post by Rafael Azevedo - IAGENTE

Post by Wietse Venema
When all greetings fail with 4xx or whatever then Postfix will
suspend deliveries.

I have no idea about what I'm doing wrong, this really doesn't
happen in my servers.

No it doesn't. Postfix logs "delivery temporarily suspended" and
skips Yahoo until the "dead host" timer expires.

Wietse

73 Replies
10 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Rafael Azevedo - IAGENTE 2013-01-07 13:12:22 UTC

Wietse Venema 2013-01-07 13:28:51 UTC

Rafael Azevedo - IAGENTE 2013-01-07 13:34:45 UTC

Wietse Venema 2013-01-07 14:17:46 UTC

Rafael Azevedo - IAGENTE 2013-01-07 14:20:19 UTC

Viktor Dukhovni 2013-01-07 16:25:32 UTC

Rafael Azevedo - IAGENTE 2013-01-07 16:37:03 UTC

Viktor Dukhovni 2013-01-07 16:47:56 UTC

Rafael Azevedo - IAGENTE 2013-01-07 17:06:42 UTC

Rafael Azevedo - IAGENTE 2013-01-07 17:19:39 UTC

Rafael Azevedo - IAGENTE 2013-01-07 17:29:53 UTC

Viktor Dukhovni 2013-01-07 17:52:46 UTC

Viktor Dukhovni 2013-01-07 17:57:54 UTC

Viktor Dukhovni 2013-01-07 17:59:23 UTC

Rafael Azevedo - IAGENTE 2013-01-07 18:07:02 UTC

Rafael Azevedo - IAGENTE 2013-01-07 18:24:20 UTC

Viktor Dukhovni 2013-01-07 18:40:37 UTC

Wietse Venema 2013-01-07 21:02:36 UTC

Viktor Dukhovni 2013-01-08 01:00:28 UTC

Rafael Azevedo - IAGENTE 2013-01-08 12:47:08 UTC

Wietse Venema 2013-01-08 12:58:38 UTC

Wietse Venema 2013-01-08 13:02:29 UTC

Wietse Venema 2013-01-08 14:09:29 UTC

Viktor Dukhovni 2013-01-08 14:36:46 UTC

Rafael Azevedo - IAGENTE 2013-01-08 14:51:18 UTC

Wietse Venema 2013-01-08 15:34:30 UTC

Rafael Azevedo - IAGENTE 2013-01-08 15:59:14 UTC

Viktor Dukhovni 2013-01-08 16:07:31 UTC

Wietse Venema 2013-01-08 16:21:21 UTC

Rafael Azevedo - IAGENTE 2013-01-08 16:38:45 UTC

Rafael Azevedo - IAGENTE 2013-01-08 16:42:42 UTC

Mark Goodge 2013-01-08 16:44:31 UTC

Wietse Venema 2013-01-08 16:48:14 UTC

Reindl Harald 2013-01-08 16:49:06 UTC

Reindl Harald 2013-01-08 16:56:41 UTC

Rafael Azevedo - IAGENTE 2013-01-08 17:01:47 UTC

Rafael Azevedo - IAGENTE 2013-01-08 17:04:37 UTC

Rafael Azevedo - IAGENTE 2013-01-08 17:05:57 UTC

Scott Lambert 2013-01-08 17:49:05 UTC

Wietse Venema 2013-01-08 18:08:21 UTC

Reindl Harald 2013-01-08 18:57:40 UTC

Viktor Dukhovni 2013-01-08 19:12:36 UTC

Wietse Venema 2013-01-08 19:16:44 UTC

Reindl Harald 2013-01-08 19:25:45 UTC

Wietse Venema 2013-01-08 19:39:17 UTC

Viktor Dukhovni 2013-01-08 19:51:44 UTC

Reindl Harald 2013-01-08 20:17:05 UTC

Wietse Venema 2013-01-08 20:40:57 UTC

Reindl Harald 2013-01-08 21:02:31 UTC

Viktor Dukhovni 2013-01-09 01:57:10 UTC

Reindl Harald 2013-01-09 02:06:58 UTC

Viktor Dukhovni 2013-01-09 02:17:33 UTC

Reindl Harald 2013-01-09 02:27:57 UTC

Rafael Azevedo - IAGENTE 2013-01-09 11:42:15 UTC

Rafael Azevedo - IAGENTE 2013-01-09 11:46:34 UTC

Rafael Azevedo - IAGENTE 2013-01-09 11:51:22 UTC

Rafael Azevedo - IAGENTE 2013-01-09 11:54:19 UTC

Rafael Azevedo - IAGENTE 2013-01-09 12:02:02 UTC

Michael P. Demelbauer 2013-01-09 14:03:43 UTC

Viktor Dukhovni 2013-01-09 14:58:00 UTC

Rafael Azevedo - IAGENTE 2013-01-09 15:29:06 UTC

John Peach 2013-01-09 15:31:45 UTC

Rafael Azevedo - IAGENTE 2013-01-09 15:37:00 UTC

John Peach 2013-01-09 15:45:59 UTC

Viktor Dukhovni 2013-01-09 15:48:16 UTC

Wietse Venema 2013-01-09 16:20:48 UTC

Rafael Azevedo - IAGENTE 2013-01-09 16:20:50 UTC

Rafael Azevedo - IAGENTE 2013-01-09 16:29:21 UTC

Wietse Venema 2013-01-09 16:30:47 UTC

Rafael Azevedo - IAGENTE 2013-01-09 16:39:15 UTC

Rafael Azevedo - IAGENTE 2013-01-09 16:44:17 UTC

Rafael Azevedo - IAGENTE 2013-01-09 16:48:27 UTC

Wietse Venema 2013-01-09 16:52:17 UTC

Wietse Venema 2013-01-09 16:54:23 UTC

about - legalese

Loading...