[prev in list] [next in list] [prev in thread] [next in thread]
List: bacula-bugs
Subject: [Bacula-bugs] [bacula 0001612]: Bacula does not save a checksum for
From: Mantis Bug Tracker <nobody () baculabugs ! unixathome ! org>
Date: 2011-04-15 5:16:44
Message-ID: 88dabe87af8e8bcd4011c5f0d666d135 () bugs ! bacula ! org
[Download RAW message or body]
The following issue has been CLOSED
======================================================================
http://bugs.bacula.org/view.php?id=1612
======================================================================
Reported By: akoch
Assigned To: ebollengier
======================================================================
Project: bacula
Issue ID: 1612
Category: Director
Reproducibility: always
Severity: minor
Priority: normal
Status: closed
Resolution: open
Fixed in Version:
======================================================================
Date Submitted: 2010-07-20 11:06 GMT
Last Modified: 2011-04-15 05:16 GMT
======================================================================
Summary: Bacula does not save a checksum for hard links
during full backup, but expects one during accurate differential backups
Description:
1) Bacula does not save a checksum for hard links during a full backup. E.g.,
here's the output of FD with debug value 400 for a hard link:
/usr/share/postgresql-8.4/man/man7/table.7.bz2
gundabad-fd: backup.c:326-32 FT_LNKSAVED hard link:
/usr/share/postgresql-8.4/man/man7/table.7.bz2 =>
/usr/share/postgresql-8.4/man/man7/with.7.bz2
gundabad-fd: backup.c:424-32 bfiled: sending
/usr/share/postgresql-8.4/man/man7/table.7.bz2 to stored
gundabad-fd: backup.c:1102-32 encode_and_send_attrs
fname=/usr/share/postgresql-8.4/man/man7/table.7.bz2
gundabad-fd: backup.c:1114-32 File
/usr/share/postgresql-8.4/man/man7/table.7.bz2
attribs=PwA Dky9 IGk C A A A 6 BAA I BMBmj+ BMBmj+ BMBmkF BW A C
attribsEx=
gundabad-fd: backup.c:1133-32 >stored: attrhdr 138 1 0
gundabad-fd: backup.c:1226-32 No strip for
/usr/share/postgresql-8.4/man/man7/table.7.bz2
gundabad-fd: backup.c:1151-32 Link
/usr/share/postgresql-8.4/man/man7/table.7.bz2 to
/usr/share/postgresql-8.4/man/man7/with.7.bz2
gundabad-fd: backup.c:1167-32 >stored: attr len=157: 138 1
/usr/share/postgresql-8.4/man/man7/table.7.bz2
gundabad-fd: backup.c:532-32 do_read=0
gundabad-fd: find_one.c:480-32 FT_LNKSAVED FI=138 LinkFI=86
file=/usr/share/postgresql-8.4/man/man7/with.7.bz2
Note the absence of the crypto_digest line.
2) The checksum _is_ saved for the regular file pointed to by the hard link:
gundabad-fd: find_one.c:357-32 File ----:
/usr/share/postgresql-8.4/man/man7/with.7.bz2
gundabad-fd: find_one.c:494-32 added to hash FI=85
file=/usr/share/postgresql-8.4/man/man7/with.7.bz2
gundabad-fd: backup.c:333-32 FT_REG saving:
/usr/share/postgresql-8.4/man/man7/with.7.bz2
gundabad-fd: backup.c:424-32 bfiled: sending
/usr/share/postgresql-8.4/man/man7/with.7.bz2 to stored
gundabad-fd: crypto.c:607-32 crypto_digest_new jcr=207b978
gundabad-fd: backup.c:1102-32 encode_and_send_attrs
fname=/usr/share/postgresql-8.4/man/man7/with.7.bz2
gundabad-fd: backup.c:1114-32 File /usr/share/postgresql-8.4/man/man7/with.7.bz2
attribs=PwA Dky9 IGk C A A A 6 BAA I BMBmj+ BMBmj+ BMBmkF A A C
attribsEx=
gundabad-fd: backup.c:1133-32 >stored: attrhdr 86 1 0
gundabad-fd: backup.c:1226-32 No strip for
/usr/share/postgresql-8.4/man/man7/with.7.bz2
gundabad-fd: backup.c:1167-32 >stored: attr len=109: 86 3
/usr/share/postgresql-8.4/man/man7/with.7.bz2
gundabad-fd: backup.c:532-32 do_read=1
gundabad-fd: bfile.c:900-32 open file
/usr/share/postgresql-8.4/man/man7/with.7.bz2
gundabad-fd: bfile.c:924-32 Open file 7
gundabad-fd: backup.c:786-32 Saving data, type=3
gundabad-fd: backup.c:863-32 >stored: datahdr 86 2 0
gundabad-fd: backup.c:1023-32 Send data to SD len=58
gundabad-fd: bfile.c:965-32 Close file 7
gundabad-fd: backup.c:715-32 bfiled>stored:header 86 10 0
gundabad-fd: find_one.c:518-32 FT_REG FI=86 linked=1
file=/usr/share/postgresql-8.4/man/man7/with.7.bz2
3) When doing a Differential Backup in Accurate pins1 mode building on that Full
backup, Bacula does expect a checksum to be present for hard links. Since that
checksum was never stored in the first place, all hard links cause warnings:
Warning: Can't verify checksum for (linkname...)
Steps to Reproduce:
1) Do a full backup of a FileSet containing one or more hard links.
2) Do an Accurate Differential of that same FileSet. All hard links in the set
will cause the warning due to missing checksum.
======================================================================
----------------------------------------------------------------------
(0005582) ebollengier (administrator) - 2010-08-31 13:24
http://bugs.bacula.org/view.php?id=1612#c5582
----------------------------------------------------------------------
We need to add the checksum in the hardlink node and send it like with other
files.
----------------------------------------------------------------------
(0005762) ebollengier (administrator) - 2011-02-19 14:23
http://bugs.bacula.org/view.php?id=1612#c5762
----------------------------------------------------------------------
Can you try the proposed patch and give us your feedbacks ?
----------------------------------------------------------------------
(0005764) bart613 (reporter) - 2011-02-20 10:15
http://bugs.bacula.org/view.php?id=1612#c5764
----------------------------------------------------------------------
For me, after applying attached patch on client, director and storage daemon, it
caused client to fail and eat up all memory causing OOM killer to kick-in and
kill everything around.
After downgrading to version without this patch everything went back to normal.
I can replicate this as many times as you want as well as I can provide SRPM/RPM
with attached patch. If you need me to perform any other operations let me know
- I will happily do.
----------------------------------------------------------------------
(0005773) ebollengier (administrator) - 2011-02-24 19:50
http://bugs.bacula.org/view.php?id=1612#c5773
----------------------------------------------------------------------
A bit strange, I will probably simplify the patch and just skip the message.
----------------------------------------------------------------------
(0005831) bart613 (reporter) - 2011-04-11 10:08
http://bugs.bacula.org/view.php?id=1612#c5831
----------------------------------------------------------------------
This bug is becoming very annoying. Especially when I am trying to read Bacula
daily backups report emails -- have to scroll down every single one through
several hundreds lines to get to the interesting bit.
As I mentioned before, I tested the patch on three installations, and it caused
OOM killer on all three.
Is there anything I could do to help you nail down the problem?
----------------------------------------------------------------------
(0005832) ebollengier (administrator) - 2011-04-11 13:01
http://bugs.bacula.org/view.php?id=1612#c5832
----------------------------------------------------------------------
I would like also fix this problem before the 5.2 release, but I'm running a bit
out of time nowdays. We have two choices, the first one is to fix the current
patch, which is pretty good I think, the second solution is to discard the
warning message.
I think that you can add an extra check when storing the checksum in the
hardlink struct and run your tests again.
in find_one.c:ff_pkt_set_digest()
- if (ff_pkt->linked)
+ if (ff_pkt->linked && (ff_pkt->linked->digest == NULL))
----------------------------------------------------------------------
(0005833) bart613 (reporter) - 2011-04-11 14:15
http://bugs.bacula.org/view.php?id=1612#c5833
----------------------------------------------------------------------
Same result -- when I tried to either run Full or Incremental backup, bacula-fd
is using almost all memory and server ending up with OOM killer rushing around.
Shortly before server ran out of memory:
[...]
root 16036 2.5 89.9 2825240 942896 ? Ssl 17:53 0:59
/usr/sbin/bacula-fd -u root -g bacula -c /etc/bacula/bacula-fd.conf
Tried to strace bacula-fd during this time but can't see anything interesting
there:
# strace -tt -f -p 16036
Process 16036 attached with 4 threads - interrupt to quit
[pid 16169] 18:32:46.345833 select(6, [5], NULL, NULL, {1, 0} <unfinished ...>
[pid 16037] 18:32:46.354453 clock_gettime(CLOCK_REALTIME, <unfinished ...>
[pid 16036] 18:32:46.354554 futex(0x2b21448df380, FUTEX_WAIT_PRIVATE, 2, NULL
<unfinished ...>
[pid 16037] 18:32:46.354653 <... clock_gettime resumed> {1302543166, 354506000})
= 0
[pid 16037] 18:32:46.354722 futex(0x2b21448dfe44, FUTEX_WAIT_PRIVATE, 389, {13,
105518000} <unfinished ...>
[pid 16169] 18:32:47.353145 <... select resumed> ) = 0 (Timeout)
[pid 16169] 18:32:47.353393 select(6, [5], NULL, NULL, {5, 0}) = 0 (Timeout)
[pid 16169] 18:32:52.353624 select(6, [5], NULL, NULL, {5, 0} <unfinished ...>
Ready for further testing and feeding back :-)
----------------------------------------------------------------------
(0005835) bart613 (reporter) - 2011-04-11 14:19
http://bugs.bacula.org/view.php?id=1612#c5835
----------------------------------------------------------------------
Also, more to add, when I killed bacula-fd I got following as a backup job
report:
11-Apr 17:56 some.host.com-dir JobId 925: Start Backup JobId 925,
Job=donkey_FS.2011-04-11_17.56.23_05
11-Apr 17:56 some.host.com-dir JobId 925: Using Device "olartek-FileStorage"
11-Apr 17:56 some.host.com JobId 925: Volume "olartek-0026" previously written,
moving to end of data.
11-Apr 17:56 some.host.com JobId 925: Error: Bacula cannot write on disk Volume
"olartek-0026" because: The sizes do not match! Volume=22855298947
Catalog=22852718467
11-Apr 17:56 some.host.com JobId 925: Marking Volume "olartek-0026" in Error in
Catalog.
11-Apr 17:56 some.host.com-dir JobId 925: Created new Volume "olartek-0029" in
catalog.
11-Apr 17:56 some.host.com JobId 925: Labeled new Volume "olartek-0029" on
device "olartek-FileStorage" (/var/lib/bacula/storage/).
11-Apr 17:56 some.host.com JobId 925: Wrote label to prelabeled Volume
"olartek-0029" on device "olartek-FileStorage" (/var/lib/bacula/storage/)
11-Apr 18:33 some.host.com JobId 925: Fatal error: append.c:242 Network error
reading from FD. ERR=No data available
11-Apr 18:33 some.host.com-dir JobId 925: Fatal error: Network error with FD
during Backup: ERR=No data available
11-Apr 18:33 some.host.com JobId 925: Job write elapsed time = 00:36:35,
Transfer rate = 36.60 K Bytes/second
11-Apr 18:33 some.host.com-dir JobId 925: Fatal error: No Job status returned
from FD.
11-Apr 18:33 some.host.com-dir JobId 925: Error: Bacula some.host.com-dir 5.0.3
(04Aug10): 11-Apr-2011 18:33:04
Build OS: x86_64-redhat-linux-gnu redhat
JobId: 925
Job: donkey_FS.2011-04-11_17.56.23_05
Backup Level: Full
Client: "some.host.com-fd" 5.0.3 (04Aug10)
x86_64-redhat-linux-gnu,redhat,
FileSet: "donkey FileSet" 2011-01-29 02:05:00
Pool: "olartek-polka Pool" (From Job resource)
Catalog: "PWS" (From Client resource)
Storage: "olartek-polka" (From Pool resource)
Scheduled time: 11-Apr-2011 17:56:19
Start time: 11-Apr-2011 17:56:26
End time: 11-Apr-2011 18:33:04
Elapsed time: 36 mins 38 secs
Priority: 10
FD Files Written: 0
SD Files Written: 0
FD Bytes Written: 0 (0 B)
SD Bytes Written: 0 (0 B)
Rate: 0.0 KB/s
Software Compression: None
VSS: no
Encryption: no
Accurate: yes
Volume name(s): olartek-0029
Volume Session Id: 54
Volume Session Time: 1302262981
Last Volume Bytes: 215 (215 B)
Non-fatal FD errors: 0
SD Errors: 0
FD termination status: Error
SD termination status: Running
Termination: *** Backup Error ***
----------------------------------------------------------------------
(0005836) ebollengier (administrator) - 2011-04-11 15:24
http://bugs.bacula.org/view.php?id=1612#c5836
----------------------------------------------------------------------
How many hardlinks do you have on your system? It's hard to believe that a
single strdup of 10 checksums can cause a out of memory. If you have millions of
them, we will have to find an other solution. It would be nice to test with a
small set of files. My tests didn't show any orphan buffers.
I can provide you a patch for the current 5.1 release.
----------------------------------------------------------------------
(0005837) bart613 (reporter) - 2011-04-12 07:50
http://bugs.bacula.org/view.php?id=1612#c5837
----------------------------------------------------------------------
There is probably around 8.000. Most of the usual hardlinks that are in pretty
much every single CentOS 5 installation these days. It rather seems like
something is dead-looping somewhere at least it uses all the memory.
If you provide patch for 5.1 release I will upgrade and double-check, perhaps it
will nicely resolve itself there.
----------------------------------------------------------------------
(0005838) ebollengier (administrator) - 2011-04-12 12:59
http://bugs.bacula.org/view.php?id=1612#c5838
----------------------------------------------------------------------
I know why my patch doesn't work, the digest string is a binary, so using
strdup() is not a good idea with it. I have to figure how to change that.
----------------------------------------------------------------------
(0005839) ebollengier (administrator) - 2011-04-12 15:46
http://bugs.bacula.org/view.php?id=1612#c5839
----------------------------------------------------------------------
The new version of the patch should work as expected now.
----------------------------------------------------------------------
(0005840) bart613 (reporter) - 2011-04-12 17:53
http://bugs.bacula.org/view.php?id=1612#c5840
----------------------------------------------------------------------
Is this one for 5.0.3 or for 5.1? It does not seem to apply to 5.0.3 cleanly...
----------------------------------------------------------------------
(0005841) ebollengier (administrator) - 2011-04-12 18:32
http://bugs.bacula.org/view.php?id=1612#c5841
----------------------------------------------------------------------
This patch is for 5.1.
----------------------------------------------------------------------
(0005842) ebollengier (administrator) - 2011-04-13 06:15
http://bugs.bacula.org/view.php?id=1612#c5842
----------------------------------------------------------------------
I found a small mistake in the previous patch, this one should be the latest.
Thanks for your testing.
----------------------------------------------------------------------
(0005846) ebollengier (administrator) - 2011-04-15 05:16
http://bugs.bacula.org/view.php?id=1612#c5846
----------------------------------------------------------------------
Fixed in the current dev branch.
Issue History
Date Modified Username Field Change
======================================================================
2010-07-20 11:06 akoch New Issue
2010-07-20 13:04 mnalis Issue Monitored: mnalis
2010-08-31 13:24 ebollengier Note Added: 0005582
2010-08-31 13:24 ebollengier Status new => acknowledged
2011-02-13 10:45 bart613 Issue Monitored: bart613
2011-02-19 14:22 ebollengier File Added:
0001-Fix-1612-about-checksum-for-hardlinks.patch
2011-02-19 14:23 ebollengier Note Added: 0005762
2011-02-19 14:23 ebollengier Assigned To => ebollengier
2011-02-19 14:23 ebollengier Status acknowledged =>
feedback
2011-02-20 10:15 bart613 Note Added: 0005764
2011-02-24 19:50 ebollengier Note Added: 0005773
2011-04-11 10:08 bart613 Note Added: 0005831
2011-04-11 13:01 ebollengier Note Added: 0005832
2011-04-11 14:15 bart613 Note Added: 0005833
2011-04-11 14:18 bart613 Note Added: 0005834
2011-04-11 14:18 bart613 Note Deleted: 0005834
2011-04-11 14:19 bart613 Note Added: 0005835
2011-04-11 15:25 ebollengier Note Added: 0005836
2011-04-11 15:26 ebollengier File Deleted:
0001-Fix-1612-about-checksum-for-hardlinks.patch
2011-04-11 15:26 ebollengier File Added:
0001-Fix-1612-about-checksum-for-hardlinks.patch
2011-04-12 07:50 bart613 Note Added: 0005837
2011-04-12 12:59 ebollengier Note Added: 0005838
2011-04-12 15:45 ebollengier File Deleted:
0001-Fix-1612-about-checksum-for-hardlinks.patch
2011-04-12 15:46 ebollengier File Added:
0001-Fix-1612-about-checksum-for-hardlinks.patch
2011-04-12 15:46 ebollengier Note Added: 0005839
2011-04-12 17:53 bart613 Note Added: 0005840
2011-04-12 18:32 ebollengier Note Added: 0005841
2011-04-13 06:07 ebollengier File Deleted:
0001-Fix-1612-about-checksum-for-hardlinks.patch
2011-04-13 06:14 ebollengier File Added:
0001-Fix-1612-about-checksum-for-hardlinks-new.patch
2011-04-13 06:15 ebollengier Note Added: 0005842
2011-04-15 05:16 ebollengier Note Added: 0005846
2011-04-15 05:16 ebollengier Status feedback => closed
======================================================================
------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve
application availability and disaster protection. Learn more about boosting
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
Bacula-bugs mailing list
Bacula-bugs@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-bugs
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic