Resolv class caching /etc/hosts entries

Hey guys,

I ran into a subtle bug recently where I did not realize that the
Resolv class actually caches the contents of the /etc/hosts file. This
is problematic, as if the resolution is done in a long-running
process, any changes are not visible to the process unless the class
is reloaded or the process is restarted. I was wondering if anyone
knows what the rationale is for caching the contents and why there are
no checks to see if the file has been modified? For now, I've worked
around this by calling Socket::getaddrinfo, as getaddrinfo does the
right thing.

···

--
Cheers,
Timur

Hi Timur,

If you're working with DNS you're going to run into all sorts of caching fun in general, so it's a good idea to be prepared for it. Applications (eg. browsers), operating systems, ISPs, and nameservers all do their own caching of DNS results. Did you know, for example, that the data could be over a week out-of-date because you are using an ISP that disregards TTLs, and your browser has been caching previous results to remain responsive?

The general rationale is that it is expensive to always have the latest information available- in terms of amount of data, or just the time taken to parse /etc/hosts, or just the initial wait in making the request. Resolver libraries are usually called very frequently with exactly the same data. To keep libraries responsive, usually the first request of a sort is made, and the resolver waits for a response. The next similar requests use cached results, rather than waiting for a response each time.

I can't speak for the Resolv class myself- I've never used it- but the rationale is probably similar. If you've got a hard requirement such as checking /etc/hosts for changes, you might want to consider a wrapper or proxy class that watches /etc/hosts for changes in its timestamp, and then does whatever you need- restarting OS-level resolvers, perhaps dropping and recreating the Resolv object, or somehow forcing it to drop its cache. Exactly what needs to be done will depend on the problem you are trying to solve.

Hope this helps. :slight_smile:

Cheers,
Garth

···

On 07/03/13 09:11, Timur Alperovich wrote:

Hey guys,

I ran into a subtle bug recently where I did not realize that the
Resolv class actually caches the contents of the /etc/hosts file. This
is problematic, as if the resolution is done in a long-running
process, any changes are not visible to the process unless the class
is reloaded or the process is restarted. I was wondering if anyone
knows what the rationale is for caching the contents and why there are
no checks to see if the file has been modified? For now, I've worked
around this by calling Socket::getaddrinfo, as getaddrinfo does the
right thing.

Hi Garth,

Thanks for taking the time to reply!

Hi Timur,

If you're working with DNS you're going to run into all sorts of caching fun
in general, so it's a good idea to be prepared for it. Applications (eg.
browsers), operating systems, ISPs, and nameservers all do their own caching
of DNS results. Did you know, for example, that the data could be over a
week out-of-date because you are using an ISP that disregards TTLs, and your
browser has been caching previous results to remain responsive?

Yes, I do realize that propagating updates can take a significant
amount of time. However, eventually, they should be propagated -- the
behavior you're describing sounds like a bug.

The general rationale is that it is expensive to always have the latest
information available- in terms of amount of data, or just the time taken to
parse /etc/hosts, or just the initial wait in making the request. Resolver
libraries are usually called very frequently with exactly the same data. To
keep libraries responsive, usually the first request of a sort is made, and
the resolver waits for a response. The next similar requests use cached
results, rather than waiting for a response each time.

Agreed, however, the cache should eventually be invalidated if there
is an update to ensure correctness.

I can't speak for the Resolv class myself- I've never used it- but the
rationale is probably similar. If you've got a hard requirement such as
checking /etc/hosts for changes, you might want to consider a wrapper or
proxy class that watches /etc/hosts for changes in its timestamp, and then
does whatever you need- restarting OS-level resolvers, perhaps dropping and
recreating the Resolv object, or somehow forcing it to drop its cache.
Exactly what needs to be done will depend on the problem you are trying to
solve.

As I mentioned, one workaround is to call getaddrinfo, since glibc
does not cache getaddrinfo responses.

However, I was trying to point out the larger issue here: the Resolv
class (and the Dnsruby gem), which expose these operations through
class methods, would never recover in case of /etc/hosts being
updated. This is not a delay in propagation, but actually incorrect
behavior -- through looking at the source, I didn't see anything that
would cause the cache to be either invalidated or updated. That in
itself appears like a bug, so I wanted to see if this was a conscious
decision and whether there are any plans to address it. Maybe "talk"
is not the best venue for it?

You are correct, however, to point out that one can work around it by
reloading the class on every query, but that seems like overkill?

···

On Wed, Mar 6, 2013 at 6:39 PM, Garthy D <garthy_lmkltybr@entropicsoftware.com> wrote:

Hope this helps. :slight_smile:

Cheers,
Garth

On 07/03/13 09:11, Timur Alperovich wrote:

Hey guys,

I ran into a subtle bug recently where I did not realize that the
Resolv class actually caches the contents of the /etc/hosts file. This
is problematic, as if the resolution is done in a long-running
process, any changes are not visible to the process unless the class
is reloaded or the process is restarted. I was wondering if anyone
knows what the rationale is for caching the contents and why there are
no checks to see if the file has been modified? For now, I've worked
around this by calling Socket::getaddrinfo, as getaddrinfo does the
right thing.

--
Cheers,
Timur

Hi Timur,

Thanks for taking the time to reply!

Not a problem.

If you're working with DNS you're going to run into all sorts of caching fun
in general, so it's a good idea to be prepared for it. Applications (eg.
browsers), operating systems, ISPs, and nameservers all do their own caching
of DNS results. Did you know, for example, that the data could be over a
week out-of-date because you are using an ISP that disregards TTLs, and your
browser has been caching previous results to remain responsive?

Yes, I do realize that propagating updates can take a significant
amount of time. However, eventually, they should be propagated -- the
behavior you're describing sounds like a bug.

The browser behaviour is an oddity- it's what you actually want most of the time, especially if the operating system lookup is sluggish (might be true on Windows? I'm not sure), but if you're doing something involving messing about with hosts, it's positively painful to work with.

For the ISP case, it's more poor behaviour on the part of an ISP rather than a bug. :} None *should* ignore TTL, because used properly it is incredibly useful, especially when migrating hosts. But some do. It drove me crazy when I had to deal with it- lazy behaviour on the part of ISPs of which you aren't even a customer can have nasty effects if your *clients* are using that ISP.

I can't speak for the Resolv class myself- I've never used it- but the
rationale is probably similar. If you've got a hard requirement such as
checking /etc/hosts for changes, you might want to consider a wrapper or
proxy class that watches /etc/hosts for changes in its timestamp, and then
does whatever you need- restarting OS-level resolvers, perhaps dropping and
recreating the Resolv object, or somehow forcing it to drop its cache.
Exactly what needs to be done will depend on the problem you are trying to
solve.

As I mentioned, one workaround is to call getaddrinfo, since glibc
does not cache getaddrinfo responses.

However, I was trying to point out the larger issue here: the Resolv
class (and the Dnsruby gem), which expose these operations through
class methods, would never recover in case of /etc/hosts being
updated. This is not a delay in propagation, but actually incorrect
behavior -- through looking at the source, I didn't see anything that
would cause the cache to be either invalidated or updated. That in
itself appears like a bug, so I wanted to see if this was a conscious
decision and whether there are any plans to address it. Maybe "talk"
is not the best venue for it?

You are correct, however, to point out that one can work around it by
reloading the class on every query, but that seems like overkill?

Oh yes, it's definitely overkill. I was just saying how I'd work around it if I was faced with a similar problem and needed a solution in the face of such a shortcoming. :slight_smile:

I'd definitely agree that if there is no way to clear the cache short of dropping the whole object, then there is an issue/shortcoming/bug.

Another thing to bear in mind is that the fault might not be in Resolv (it might be, I'm not familiar with it). On Linux, for example, gethostbyname (apparently) goes through nscd, which is providing its own caching. I know there have been times in the past when I've disabled that, where the caching has caused more problems than it solved. In this case, even being able to drop the Resolv cache wouldn't be enough, because it is layered on something else that is providing caching. Now this is all thoeretical, but I'm just bringing up something that might be happening. Is the fault actually with Resolv? If not, you're going to find where the problem occurs, and find a way around that issue. If yes, it does seem like the lack of ability to drop the cache is either a shortcoming or bug. But in either case, what to do? Suppose no immediate fix is available, or you need something that works generally. You might need to add some logic to your app to handle the additional hard requirement you have (immediate update if /etc/hosts changes), to ensure that no matter the details of the underlying implementation, your app behaves as it should. The audience for Resolv might be more geared toward simpler use cases employed by browsers and net apps, where ongoing indefinite caching is sufficient.

Having said that- I definitely don't want to derail any intended discussion on any shortcomings of the Resolv class. Please don't take it that way. :slight_smile: I'm just running through some possible concerns and solutions. I'm not suggesting that Resolv is completely fine and shouldn't be changed. From what you've described, it sounds like there is an issue there in there that needs to be addressed- being able to drop the cache at a minimum, and detecting source changes (eg. /etc/hosts) at best.

Cheers,
Garth

···

On 07/03/13 18:15, Timur Alperovich wrote:

On Wed, Mar 6, 2013 at 6:39 PM, Garthy D > <garthy_lmkltybr@entropicsoftware.com> wrote:

Hi Garth,

Hi Timur,

Thanks for taking the time to reply!

Not a problem.

If you're working with DNS you're going to run into all sorts of

caching fun

in general, so it's a good idea to be prepared for it. Applications (eg.
browsers), operating systems, ISPs, and nameservers all do their own

caching

of DNS results. Did you know, for example, that the data could be over a
week out-of-date because you are using an ISP that disregards TTLs, and

your

browser has been caching previous results to remain responsive?

Yes, I do realize that propagating updates can take a significant
amount of time. However, eventually, they should be propagated -- the
behavior you're describing sounds like a bug.

The browser behaviour is an oddity- it's what you actually want most of

the time, especially if the operating system lookup is sluggish (might be
true on Windows? I'm not sure), but if you're doing something involving
messing about with hosts, it's positively painful to work with.

For the ISP case, it's more poor behaviour on the part of an ISP rather

than a bug. :} None *should* ignore TTL, because used properly it is
incredibly useful, especially when migrating hosts. But some do. It drove
me crazy when I had to deal with it- lazy behaviour on the part of ISPs of
which you aren't even a customer can have nasty effects if your *clients*
are using that ISP.

I can't speak for the Resolv class myself- I've never used it- but the
rationale is probably similar. If you've got a hard requirement such as
checking /etc/hosts for changes, you might want to consider a wrapper or
proxy class that watches /etc/hosts for changes in its timestamp, and

then

does whatever you need- restarting OS-level resolvers, perhaps dropping

and

recreating the Resolv object, or somehow forcing it to drop its cache.
Exactly what needs to be done will depend on the problem you are trying

to

solve.

As I mentioned, one workaround is to call getaddrinfo, since glibc
does not cache getaddrinfo responses.

However, I was trying to point out the larger issue here: the Resolv
class (and the Dnsruby gem), which expose these operations through
class methods, would never recover in case of /etc/hosts being
updated. This is not a delay in propagation, but actually incorrect
behavior -- through looking at the source, I didn't see anything that
would cause the cache to be either invalidated or updated. That in
itself appears like a bug, so I wanted to see if this was a conscious
decision and whether there are any plans to address it. Maybe "talk"
is not the best venue for it?

You are correct, however, to point out that one can work around it by
reloading the class on every query, but that seems like overkill?

Oh yes, it's definitely overkill. I was just saying how I'd work around

it if I was faced with a similar problem and needed a solution in the face
of such a shortcoming. :slight_smile:

I'd definitely agree that if there is no way to clear the cache short of

dropping the whole object, then there is an issue/shortcoming/bug.

Another thing to bear in mind is that the fault might not be in Resolv

(it might be, I'm not familiar with it). On Linux, for example,
gethostbyname (apparently) goes through nscd, which is providing its own
caching. I know there have been times in the past when I've disabled that,
where the caching has caused more problems than it solved. In this case,
even being able to drop the Resolv cache wouldn't be enough, because it is
layered on something else that is providing caching. Now this is all
thoeretical, but I'm just bringing up something that might be happening. Is
the fault actually with Resolv? If not, you're going to find where the
problem occurs, and find a way around that issue. If yes, it does seem like
the lack of ability to drop the cache is either a shortcoming or bug. But
in either case, what to do? Suppose no immediate fix is available, or you
need something that works generally. You might need to add some logic to
your app to handle the additional hard requirement you have (immediate
update if /etc/hosts changes), to ensure that no matter the details of the
underlying implementation, your app behaves as it should. The audience for
Resolv might be more geared toward simpler use cases employed by browsers
and net apps, where ongoing indefinite caching is sufficient.

You may be right about the audience for the module being geared toward
shorter-lived applications. I did look through both the code and published
API, as well as trying some tests of my own. I did not find a solution to
this caching issue, which is why I looked to the list for insight. Should I
ask the same question on the core ruby list?

Having said that- I definitely don't want to derail any intended

discussion on any shortcomings of the Resolv class. Please don't take it
that way. :slight_smile: I'm just running through some possible concerns and solutions.
I'm not suggesting that Resolv is completely fine and shouldn't be changed.
From what you've described, it sounds like there is an issue there in there
that needs to be addressed- being able to drop the cache at a minimum, and
detecting source changes (eg. /etc/hosts) at best.

Agreed. I appreciate you trying to help. At this point, however, I'd like
to figure out the best place to figure out what's going on with that gem
(core mailing list?).

···

On Mar 7, 2013 12:44 AM, "Garthy D" <garthy_lmkltybr@entropicsoftware.com> wrote:

On 07/03/13 18:15, Timur Alperovich wrote:

On Wed, Mar 6, 2013 at 6:39 PM, Garthy D >> <garthy_lmkltybr@entropicsoftware.com> wrote:

Cheers,
Garth

Hi Timur,

Agreed. I appreciate you trying to help. At this point, however, I'd
like to figure out the best place to figure out what's going on with
that gem (core mailing list?).

I can't say for sure. However, in your shoes I'd probably start here (ie. ruby-talk). If I had no further luck, under the circumstances, I'd consider if filing a bug/feature request here would be appropriate:

https://bugs.ruby-lang.org/

I believe this would end up on ruby-core too.

Somebody else might have some better suggestions to add though?

Cheers,
Garth

···

On 09/03/13 14:34, Timur Alperovich wrote: