[ruby-talk:444294] Alternative logging solutions

Presently our applications use the standard Logger package that comes with
Ruby and it works just fine. We use it in conjunction with the standard
unix logrotate. Up to now everything has been working as expected

Until the logrotate hit in the middle of some significant processing and a
client had a query. What we found was a 2 second gap in the logs. The end
of the old log was the start of a transaction and the start of the new log
was the end of a completely different transaction. So we have lost partial
information from two transactions and it is possible that there was at
least one transaction between these two that was lost completely. Each
transaction takes around 1 second to process, sometimes less

This time it was not too big an issue and there were other ways to
investigate the problem

Is there some other way of logging that is even less likely to lose data?

Couple of solutions come to mind:

1. If this seldom happens the easiest solution would be to do nothing
2. Logger gem has support for 'period-based rotated logging'. The 'daily' option seems perfect
3. Switch from logrotate to syslog-ng, I have not used this myself but it seems to have an option for lossless rotation
4. Add important events that you never want to miss to a table in the database
5. Use an elaborate logging solution consisting of for instance:
fluent-bit to collect log entries from your application
loki to ingest/store log entries
grafana to visualize/monitor/etc logs
When using (managed) kubernetes this is pretty easy to set up.
If you are going this way it might also be a good idea to use the
'semantic-logger' gem to generate log entries in json format.
This way you can easily add more context to log entries and use this in grafana.

···

On 04-08-2023 11:55, Peter Hickman via ruby-talk wrote:

Presently our applications use the standard Logger package that comes with Ruby and it works just fine. We use it in conjunction with the standard unix logrotate. Up to now everything has been working as expected

Until the logrotate hit in the middle of some significant processing and a client had a query. What we found was a 2 second gap in the logs. The end of the old log was the start of a transaction and the start of the new log was the end of a completely different transaction. So we have lost partial information from two transactions and it is possible that there was at least one transaction between these two that was lost completely. Each transaction takes around 1 second to process, sometimes less

This time it was not too big an issue and there were other ways to investigate the problem

Is there some other way of logging that is even less likely to lose data?

  ______________________________________________
  ruby-talk mailing list --ruby-talk@ml.ruby-lang.org
  To unsubscribe send an email toruby-talk-leave@ml.ruby-lang.org
  ruby-talk info --Info | ruby-talk@ml.ruby-lang.org - ml.ruby-lang.org

1) There is the possibility that the missing information could be crucial.
Clients are settling bets based on these processes. If things go tits up on
a major event (think Football World Cup and the like) a lot of money could
be at stake and if it is not our problem but we cannot prove it then we
could take a financial and reputational hit. Besides not knowing what
happened when it went wrong makes fixing it hard

2) We used to use the rotation feature that was built into the Logger
package but we found in the past that when summer time happened the logger
died because the new file already existed (not sure what was happening but
moving the logfile with that days timestamp elsewhere fixed it). It was in
our support calendar to check for this specifically until we had migrated
everything to use the standard logrotate. This was years ago and this may
be fixed now but testing it is not convenient when summer time changes
happen only twice a year :slight_smile:

3) syslog does look like the next thing to look at. I had a feeling that
the most reliable alternative would be this. I will have to dig into it
further

I have considered logging to a queue and have a reader create the actual
log file but syslog will probably be the simpler solution

Thanks for you input

You are welcome.

Ah, financial transactions. In that case it seems to me that you have two separate requirements:
1. Logging of requests, responses, errors. So the usual stuff to monitor the application
2. An audit trail of each financial event. In my opinion this does not belong in a logfile but in a database.
So in a single database transaction: create the bet, lock the customer account, debit the customer account and create an audit entry. If the transaction succeeds happy days, if it fails try to create a failure audit entry and report back to the customer. If all fails (a Ruby exception) then hopefully the logfile will at least record the error.

···

On 04-08-2023 13:59, Peter Hickman via ruby-talk wrote:

1) There is the possibility that the missing information could be crucial. Clients are settling bets based on these processes. If things go tits up on a major event (think Football World Cup and the like) a lot of money could be at stake and if it is not our problem but we cannot prove it then we could take a financial and reputational hit. Besides not knowing what happened when it went wrong makes fixing it hard

2) We used to use the rotation feature that was built into the Logger package but we found in the past that when summer time happened the logger died because the new file already existed (not sure what was happening but moving the logfile with that days timestamp elsewhere fixed it). It was in our support calendar to check for this specifically until we had migrated everything to use the standard logrotate. This was years ago and this may be fixed now but testing it is not convenient when summer time changes happen only twice a year :slight_smile:

3) syslog does look like the next thing to look at. I had a feeling that the most reliable alternative would be this. I will have to dig into it further

I have considered logging to a queue and have a reader create the actual log file but syslog will probably be the simpler solution

Thanks for you input

  ______________________________________________
  ruby-talk mailing list --ruby-talk@ml.ruby-lang.org
  To unsubscribe send an email toruby-talk-leave@ml.ruby-lang.org
  ruby-talk info --Info | ruby-talk@ml.ruby-lang.org - ml.ruby-lang.org

Can you provide your logrotate configuration or an example of it? It's
been a while since I messed with logrotate, but I seem to recall that there
are cases when using compression where there could be gaps in the saved
data. I also want to say that there's a workaround for that problem. In
any case, seeing your configuration may help us all understand what's going
on in your case.

-Jeremy

···

On Fri, Aug 4, 2023 at 4:55 AM Peter Hickman via ruby-talk < ruby-talk@ml.ruby-lang.org> wrote:

Until the logrotate hit in the middle of some significant processing and a
client had a query. What we found was a 2 second gap in the logs. The end
of the old log was the start of a transaction and the start of the new log
was the end of a completely different transaction. So we have lost partial
information from two transactions and it is possible that there was at
least one transaction between these two that was lost completely. Each
transaction takes around 1 second to process, sometimes less

In addition to the other suggestions, you might try switching logrotate
from copytruncate mode to move/signal and reopen the log when the process
gets signaled. Though I've never used that technique in ruby; I usually
jump straight to syslog. (Even better: syslog local and forwarded to
a collector over TCP/TLS with a local spool to buffer messages when the
remote connection is down.)

Another option is to log to stderr and let the service manager
(daemontools, s6, systemd, etc.) or container runtime capture and preserve
the messages.

···

______________________________________________
ruby-talk mailing list -- ruby-talk@ml.ruby-lang.org
To unsubscribe send an email to ruby-talk-leave@ml.ruby-lang.org
ruby-talk info -- Info | ruby-talk@ml.ruby-lang.org - ml.ruby-lang.org

We are not doing the transactions, we are settling the market that the bets
are placed against. We say something like "Player X scored the first goal"
and the client then pays out the customers who made that bet and takes
money from those that bet on a different goal scorer. If we get it wrong
the client has paid money out they cannot get back (reasons) and then they
have to pay out the correct bet, "Player Y is the first goal scorer". There
are conditions that should make us delay announcing the settlement (goal
being challenged / reviewed). It's not enough to know that a goal was
scored but we need to be sure that the goal is valid. So we log like crazy
to make sure we know why we made our decisions

Around 50 to 800 events that may affect the settlement of a market occur
per match so we can log Gbs a day. But if something goes wrong sods law
says it will be in the 2 second window that is logrotate

Anyhow that was a bit off topic

···

On Fri, 4 Aug 2023 at 13:37, Ivo Herweijer via ruby-talk < ruby-talk@ml.ruby-lang.org> wrote:

You are welcome.

Ah, financial transactions. In that case it seems to me that you have two
separate requirements:
1. Logging of requests, responses, errors. So the usual stuff to monitor
the application
2. An audit trail of each financial event. In my opinion this does not
belong in a logfile but in a database.
So in a single database transaction: create the bet, lock the customer
account, debit the customer account and create an audit entry. If the
transaction succeeds happy days, if it fails try to create a failure audit
entry and report back to the customer. If all fails (a Ruby exception) then
hopefully the logfile will at least record the error.

On 04-08-2023 13:59, Peter Hickman via ruby-talk wrote:

1) There is the possibility that the missing information could be crucial.
Clients are settling bets based on these processes. If things go tits up on
a major event (think Football World Cup and the like) a lot of money could
be at stake and if it is not our problem but we cannot prove it then we
could take a financial and reputational hit. Besides not knowing what
happened when it went wrong makes fixing it hard

2) We used to use the rotation feature that was built into the Logger
package but we found in the past that when summer time happened the logger
died because the new file already existed (not sure what was happening but
moving the logfile with that days timestamp elsewhere fixed it). It was in
our support calendar to check for this specifically until we had migrated
everything to use the standard logrotate. This was years ago and this may
be fixed now but testing it is not convenient when summer time changes
happen only twice a year :slight_smile:

3) syslog does look like the next thing to look at. I had a feeling that
the most reliable alternative would be this. I will have to dig into it
further

I have considered logging to a queue and have a reader create the actual
log file but syslog will probably be the simpler solution

Thanks for you input

______________________________________________
ruby-talk mailing list -- ruby-talk@ml.ruby-lang.org
To unsubscribe send an email to ruby-talk-leave@ml.ruby-lang.org
ruby-talk info -- Info | ruby-talk@ml.ruby-lang.org - ml.ruby-lang.org

______________________________________________
ruby-talk mailing list -- ruby-talk@ml.ruby-lang.org
To unsubscribe send an email to ruby-talk-leave@ml.ruby-lang.org
ruby-talk info --
Info | ruby-talk@ml.ruby-lang.org - ml.ruby-lang.org

Here is the logrotate

/home/results/results/shared/log/*.log {
  daily
  missingok
  rotate 30
  compress
  delaycompress
  copytruncate
  notifempty
  dateext
  create 644 results results
}

···

On Fri, 4 Aug 2023 at 13:49, Jeremy Bopp <jeremy@bopp.net> wrote:

On Fri, Aug 4, 2023 at 4:55 AM Peter Hickman via ruby-talk < > ruby-talk@ml.ruby-lang.org> wrote:

Until the logrotate hit in the middle of some significant processing and
a client had a query. What we found was a 2 second gap in the logs. The end
of the old log was the start of a transaction and the start of the new log
was the end of a completely different transaction. So we have lost partial
information from two transactions and it is possible that there was at
least one transaction between these two that was lost completely. Each
transaction takes around 1 second to process, sometimes less

Can you provide your logrotate configuration or an example of it? It's
been a while since I messed with logrotate, but I seem to recall that there
are cases when using compression where there could be gaps in the saved
data. I also want to say that there's a workaround for that problem. In
any case, seeing your configuration may help us all understand what's going
on in your case.

-Jeremy

I'm pretty sure copytruncate is the problem. As documented in the manpage
for logrotate for that option:

"Note that there is a very small time slice between copying the file and
truncating it, so some logging data might be lost."

The amount of data lost will depend on how rapidly data is being written to
the file. You'll want to switch away from that option, and the postrotate
option may be the best way to go if you want to keep using logrotate.
However, your app would need to be modified to be able to respond to some
kind of signal so that it can re-open the log file. If modifying your
application in that way isn't feasible, then you'll want to consider one of
the other solutions suggested earlier in this thread. I personally
recommend against using the syslog protocol due to limitations around
handling embedded newlines in your log messages, but if you can avoid or
escape the newlines, it should be fine.

-Jeremy

···

On Fri, Aug 4, 2023 at 8:13 AM Peter Hickman via ruby-talk < ruby-talk@ml.ruby-lang.org> wrote:

Here is the logrotate

/home/results/results/shared/log/*.log {
  daily
  missingok
  rotate 30
  compress
  delaycompress
  copytruncate
  notifempty
  dateext
  create 644 results results
}

Sounds like a thing to test would be removing copytruncate and seeing what
happens. Thanks for the insight

A super quick and very dirty test of concept:

test.rb has one thread that writes to logger and another thread that
renames the file and sends a HUP signal (representing logrotate).

$ rm /tmp/loggerhup* ; ruby test.rb /tmp/loggerhup & tail -F /tmp/loggerhup
[1] 44773
tail: cannot open '/tmp/loggerhup' for reading: No such file or directory
do
tail: '/tmp/loggerhup' has appeared; following new file
# Logfile created on 2023-08-04 17:23:46 -0400 by logger.rb/v1.4.2
I, [2023-08-04T17:23:47.795742 #44773] INFO -- : 1
I, [2023-08-04T17:23:47.952905 #44773] INFO -- : 2
I, [2023-08-04T17:23:48.051453 #44773] INFO -- : 3
I, [2023-08-04T17:23:48.274491 #44773] INFO -- : 4
I, [2023-08-04T17:23:48.359070 #44773] INFO -- : 5
D, [2023-08-04T17:23:48.797585 #44773] DEBUG -- : Received signal, reopening...
tail: '/tmp/loggerhup' has been replaced; following new file
# Logfile created on 2023-08-04 17:23:52 -0400 by logger.rb/v1.4.2
I, [2023-08-04T17:23:52.951757 #44773] INFO -- : 14
I, [2023-08-04T17:23:53.281451 #44773] INFO -- : 15
I, [2023-08-04T17:23:53.457322 #44773] INFO -- : 16
I, [2023-08-04T17:23:54.426525 #44773] INFO -- : 17
D, [2023-08-04T17:23:54.799463 #44773] DEBUG -- : Received signal, reopening...
end
tail: '/tmp/loggerhup' has been replaced; following new file
# Logfile created on 2023-08-04 17:23:56 -0400 by logger.rb/v1.4.2
I, [2023-08-04T17:23:57.443366 #44773] INFO -- : 22
I, [2023-08-04T17:23:57.884349 #44773] INFO -- : 23
I, [2023-08-04T17:23:58.713242 #44773] INFO -- : 24
^C[1]+ Done ruby test.rb /tmp/loggerhup

After the run, no messages are missing:

$ grep . /tmp/loggerhup*
/tmp/loggerhup:# Logfile created on 2023-08-04 17:23:56 -0400 by
logger.rb/v1.4.2
/tmp/loggerhup:I, [2023-08-04T17:23:57.443366 #44773] INFO -- : 22
/tmp/loggerhup:I, [2023-08-04T17:23:57.884349 #44773] INFO -- : 23
/tmp/loggerhup:I, [2023-08-04T17:23:58.713242 #44773] INFO -- : 24
/tmp/loggerhup-005:# Logfile created on 2023-08-04 17:23:46 -0400 by
logger.rb/v1.4.2
/tmp/loggerhup-005:I, [2023-08-04T17:23:47.795742 #44773] INFO -- : 1
/tmp/loggerhup-005:I, [2023-08-04T17:23:47.952905 #44773] INFO -- : 2
/tmp/loggerhup-005:I, [2023-08-04T17:23:48.051453 #44773] INFO -- : 3
/tmp/loggerhup-005:I, [2023-08-04T17:23:48.274491 #44773] INFO -- : 4
/tmp/loggerhup-005:I, [2023-08-04T17:23:48.359070 #44773] INFO -- : 5
/tmp/loggerhup-005:D, [2023-08-04T17:23:48.797585 #44773] DEBUG -- :
Received signal, reopening...
/tmp/loggerhup-008:# Logfile created on 2023-08-04 17:23:48 -0400 by
logger.rb/v1.4.2
/tmp/loggerhup-008:I, [2023-08-04T17:23:49.313963 #44773] INFO -- : 6
/tmp/loggerhup-008:I, [2023-08-04T17:23:50.008360 #44773] INFO -- : 7
/tmp/loggerhup-008:I, [2023-08-04T17:23:50.131976 #44773] INFO -- : 8
/tmp/loggerhup-008:D, [2023-08-04T17:23:50.798364 #44773] DEBUG -- :
Received signal, reopening...
/tmp/loggerhup-013:# Logfile created on 2023-08-04 17:23:50 -0400 by
logger.rb/v1.4.2
/tmp/loggerhup-013:I, [2023-08-04T17:23:50.988746 #44773] INFO -- : 9
/tmp/loggerhup-013:I, [2023-08-04T17:23:51.255564 #44773] INFO -- : 10
/tmp/loggerhup-013:I, [2023-08-04T17:23:52.133295 #44773] INFO -- : 11
/tmp/loggerhup-013:I, [2023-08-04T17:23:52.193747 #44773] INFO -- : 12
/tmp/loggerhup-013:I, [2023-08-04T17:23:52.499093 #44773] INFO -- : 13
/tmp/loggerhup-013:D, [2023-08-04T17:23:52.798819 #44773] DEBUG -- :
Received signal, reopening...
/tmp/loggerhup-017:# Logfile created on 2023-08-04 17:23:52 -0400 by
logger.rb/v1.4.2
/tmp/loggerhup-017:I, [2023-08-04T17:23:52.951757 #44773] INFO -- : 14
/tmp/loggerhup-017:I, [2023-08-04T17:23:53.281451 #44773] INFO -- : 15
/tmp/loggerhup-017:I, [2023-08-04T17:23:53.457322 #44773] INFO -- : 16
/tmp/loggerhup-017:I, [2023-08-04T17:23:54.426525 #44773] INFO -- : 17
/tmp/loggerhup-017:D, [2023-08-04T17:23:54.799463 #44773] DEBUG -- :
Received signal, reopening...
/tmp/loggerhup-021:# Logfile created on 2023-08-04 17:23:54 -0400 by
logger.rb/v1.4.2
/tmp/loggerhup-021:I, [2023-08-04T17:23:55.304150 #44773] INFO -- : 18
/tmp/loggerhup-021:I, [2023-08-04T17:23:56.029091 #44773] INFO -- : 19
/tmp/loggerhup-021:I, [2023-08-04T17:23:56.160495 #44773] INFO -- : 20
/tmp/loggerhup-021:I, [2023-08-04T17:23:56.719132 #44773] INFO -- : 21
/tmp/loggerhup-021:D, [2023-08-04T17:23:56.800298 #44773] DEBUG -- :
Received signal, reopening...

$ cat test.rb
#!/usr/bin/env ruby
require 'logger'
require_relative 'loggerhup'
logger = Logger.new(ARGV[0])
logger.level = Logger::DEBUG
logger.reopen_on_signal
puts 'do'
counter = 0
sleep 1
Thread.new do loop do logger.info(counter+=1); sleep rand end end
sleep 1
Thread.new do loop do File.rename(ARGV[0], "%s-%03d" % [ARGV[0],
counter]) rescue nil; Process.kill('HUP', 0); sleep(2) end end
sleep 10
puts 'end'

$ cat loggerhup.rb
class Logger
  def reopen_on_signal(signal = 'HUP')
    if @_hupq or @_hupt
      warn 'Ignoring repeated call to #reopen_on_signal'
      return
    end

    @_hupq ||= Queue.new
    @_hupt ||= Thread.new do
      loop do
        @_hupq.pop
        debug 'Received signal, reopening...'
        reopen
      end
    end

    Signal.trap(signal) do
      @_hupq.push(nil)
    end
  end
end

Can't call Logger#reopen directly from the signal handler since it uses
a Mutex, hence the Queue hack:

···

______________________________________________
ruby-talk mailing list -- ruby-talk@ml.ruby-lang.org
To unsubscribe send an email to ruby-talk-leave@ml.ruby-lang.org
ruby-talk info -- Info | ruby-talk@ml.ruby-lang.org - ml.ruby-lang.org