[prev in list] [next in list] [prev in thread] [next in thread] 

List:       bacula-bugs
Subject:    [Bacula-bugs] [bacula 0001612]: Bacula does not save a checksum for
From:       Mantis Bug Tracker <nobody () baculabugs ! unixathome ! org>
Date:       2011-04-15 5:16:44
Message-ID: 88dabe87af8e8bcd4011c5f0d666d135 () bugs ! bacula ! org
[Download RAW message or body]


The following issue has been CLOSED 
====================================================================== 
http://bugs.bacula.org/view.php?id=1612 
====================================================================== 
Reported By:                akoch
Assigned To:                ebollengier
====================================================================== 
Project:                    bacula
Issue ID:                   1612
Category:                   Director
Reproducibility:            always
Severity:                   minor
Priority:                   normal
Status:                     closed
Resolution:                 open
Fixed in Version:           
====================================================================== 
Date Submitted:             2010-07-20 11:06 GMT
Last Modified:              2011-04-15 05:16 GMT
====================================================================== 
Summary:                    Bacula does not save a checksum for hard links
during full backup, but expects one during accurate differential backups
Description: 
1) Bacula does not save a checksum for hard links during a full backup. E.g.,
here's the output of FD with debug value 400 for a hard link:

/usr/share/postgresql-8.4/man/man7/table.7.bz2
gundabad-fd: backup.c:326-32 FT_LNKSAVED hard link:
/usr/share/postgresql-8.4/man/man7/table.7.bz2 =>
/usr/share/postgresql-8.4/man/man7/with.7.bz2
gundabad-fd: backup.c:424-32 bfiled: sending
/usr/share/postgresql-8.4/man/man7/table.7.bz2 to stored
gundabad-fd: backup.c:1102-32 encode_and_send_attrs
fname=/usr/share/postgresql-8.4/man/man7/table.7.bz2
gundabad-fd: backup.c:1114-32 File
/usr/share/postgresql-8.4/man/man7/table.7.bz2
attribs=PwA Dky9 IGk C A A A 6 BAA I BMBmj+ BMBmj+ BMBmkF BW A C
attribsEx=
gundabad-fd: backup.c:1133-32 >stored: attrhdr 138 1 0
gundabad-fd: backup.c:1226-32 No strip for
/usr/share/postgresql-8.4/man/man7/table.7.bz2
gundabad-fd: backup.c:1151-32 Link
/usr/share/postgresql-8.4/man/man7/table.7.bz2 to
/usr/share/postgresql-8.4/man/man7/with.7.bz2
gundabad-fd: backup.c:1167-32 >stored: attr len=157: 138 1
/usr/share/postgresql-8.4/man/man7/table.7.bz2
gundabad-fd: backup.c:532-32 do_read=0
gundabad-fd: find_one.c:480-32 FT_LNKSAVED FI=138 LinkFI=86
file=/usr/share/postgresql-8.4/man/man7/with.7.bz2

Note the absence of the crypto_digest line.

2) The checksum _is_ saved for the regular file pointed to by the hard link:

gundabad-fd: find_one.c:357-32 File ----:
/usr/share/postgresql-8.4/man/man7/with.7.bz2
gundabad-fd: find_one.c:494-32 added to hash FI=85
file=/usr/share/postgresql-8.4/man/man7/with.7.bz2
gundabad-fd: backup.c:333-32 FT_REG saving:
/usr/share/postgresql-8.4/man/man7/with.7.bz2
gundabad-fd: backup.c:424-32 bfiled: sending
/usr/share/postgresql-8.4/man/man7/with.7.bz2 to stored
gundabad-fd: crypto.c:607-32 crypto_digest_new jcr=207b978
gundabad-fd: backup.c:1102-32 encode_and_send_attrs
fname=/usr/share/postgresql-8.4/man/man7/with.7.bz2
gundabad-fd: backup.c:1114-32 File /usr/share/postgresql-8.4/man/man7/with.7.bz2
attribs=PwA Dky9 IGk C A A A 6 BAA I BMBmj+ BMBmj+ BMBmkF A A C
attribsEx=
gundabad-fd: backup.c:1133-32 >stored: attrhdr 86 1 0
gundabad-fd: backup.c:1226-32 No strip for
/usr/share/postgresql-8.4/man/man7/with.7.bz2
gundabad-fd: backup.c:1167-32 >stored: attr len=109: 86 3
/usr/share/postgresql-8.4/man/man7/with.7.bz2
gundabad-fd: backup.c:532-32 do_read=1
gundabad-fd: bfile.c:900-32 open file
/usr/share/postgresql-8.4/man/man7/with.7.bz2
gundabad-fd: bfile.c:924-32 Open file 7
gundabad-fd: backup.c:786-32 Saving data, type=3
gundabad-fd: backup.c:863-32 >stored: datahdr 86 2 0
gundabad-fd: backup.c:1023-32 Send data to SD len=58
gundabad-fd: bfile.c:965-32 Close file 7
gundabad-fd: backup.c:715-32 bfiled>stored:header 86 10 0
gundabad-fd: find_one.c:518-32 FT_REG FI=86 linked=1
file=/usr/share/postgresql-8.4/man/man7/with.7.bz2

3) When doing a Differential Backup in Accurate pins1 mode building on that Full
backup, Bacula does expect a checksum to be present for hard links. Since that
checksum was never stored in the first place, all hard links cause warnings:

   Warning: Can't verify checksum for (linkname...)

   



Steps to Reproduce: 
1) Do a full backup of a FileSet containing one or more hard links.

2) Do an Accurate Differential of that same FileSet. All hard links in the set
will cause the warning due to missing checksum.
====================================================================== 

---------------------------------------------------------------------- 
 (0005582) ebollengier (administrator) - 2010-08-31 13:24
 http://bugs.bacula.org/view.php?id=1612#c5582 
---------------------------------------------------------------------- 
We need to add the checksum in the hardlink node and send it like with other
files. 

---------------------------------------------------------------------- 
 (0005762) ebollengier (administrator) - 2011-02-19 14:23
 http://bugs.bacula.org/view.php?id=1612#c5762 
---------------------------------------------------------------------- 
Can you try the proposed patch and give us your feedbacks ? 

---------------------------------------------------------------------- 
 (0005764) bart613 (reporter) - 2011-02-20 10:15
 http://bugs.bacula.org/view.php?id=1612#c5764 
---------------------------------------------------------------------- 
For me, after applying attached patch on client, director and storage daemon, it
caused client to fail and eat up all memory causing OOM killer to kick-in and
kill everything around.

After downgrading to version without this patch everything went back to normal.

I can replicate this as many times as you want as well as I can provide SRPM/RPM
with attached patch.  If you need me to perform any other operations let me know
- I will happily do. 

---------------------------------------------------------------------- 
 (0005773) ebollengier (administrator) - 2011-02-24 19:50
 http://bugs.bacula.org/view.php?id=1612#c5773 
---------------------------------------------------------------------- 
A bit strange, I will probably simplify the patch and just skip the message. 

---------------------------------------------------------------------- 
 (0005831) bart613 (reporter) - 2011-04-11 10:08
 http://bugs.bacula.org/view.php?id=1612#c5831 
---------------------------------------------------------------------- 
This bug is becoming very annoying.  Especially when I am trying to read Bacula
daily backups report emails -- have to scroll down every single one through
several hundreds lines to get to the interesting bit.

As I mentioned before, I tested the patch on three installations, and it caused
OOM killer on all three.

Is there anything I could do to help you nail down the problem? 

---------------------------------------------------------------------- 
 (0005832) ebollengier (administrator) - 2011-04-11 13:01
 http://bugs.bacula.org/view.php?id=1612#c5832 
---------------------------------------------------------------------- 
I would like also fix this problem before the 5.2 release, but I'm running a bit
out of time nowdays. We have two choices, the first one is to fix the current
patch, which is pretty good I think, the second solution is to discard the
warning message.

I think that you can add an extra check when storing the checksum in the
hardlink struct and run your tests again.

in find_one.c:ff_pkt_set_digest()
- if (ff_pkt->linked)
+ if (ff_pkt->linked && (ff_pkt->linked->digest == NULL)) 

---------------------------------------------------------------------- 
 (0005833) bart613 (reporter) - 2011-04-11 14:15
 http://bugs.bacula.org/view.php?id=1612#c5833 
---------------------------------------------------------------------- 
Same result -- when I tried to either run Full or Incremental backup, bacula-fd
is using almost all memory and server ending up with OOM killer rushing around.

Shortly before server ran out of memory:
[...]
root     16036  2.5 89.9 2825240 942896 ?      Ssl  17:53   0:59
/usr/sbin/bacula-fd -u root -g bacula -c /etc/bacula/bacula-fd.conf

Tried to strace bacula-fd during this time but can't see anything interesting
there:

# strace -tt -f -p 16036
Process 16036 attached with 4 threads - interrupt to quit
[pid 16169] 18:32:46.345833 select(6, [5], NULL, NULL, {1, 0} <unfinished ...>
[pid 16037] 18:32:46.354453 clock_gettime(CLOCK_REALTIME,  <unfinished ...>
[pid 16036] 18:32:46.354554 futex(0x2b21448df380, FUTEX_WAIT_PRIVATE, 2, NULL
<unfinished ...>
[pid 16037] 18:32:46.354653 <... clock_gettime resumed> {1302543166, 354506000})
= 0
[pid 16037] 18:32:46.354722 futex(0x2b21448dfe44, FUTEX_WAIT_PRIVATE, 389, {13,
105518000} <unfinished ...>
[pid 16169] 18:32:47.353145 <... select resumed> ) = 0 (Timeout)
[pid 16169] 18:32:47.353393 select(6, [5], NULL, NULL, {5, 0}) = 0 (Timeout)
[pid 16169] 18:32:52.353624 select(6, [5], NULL, NULL, {5, 0} <unfinished ...>

Ready for further testing and feeding back :-) 

---------------------------------------------------------------------- 
 (0005835) bart613 (reporter) - 2011-04-11 14:19
 http://bugs.bacula.org/view.php?id=1612#c5835 
---------------------------------------------------------------------- 
Also, more to add, when I killed bacula-fd I got following as a backup job
report:

11-Apr 17:56 some.host.com-dir JobId 925: Start Backup JobId 925,
Job=donkey_FS.2011-04-11_17.56.23_05
11-Apr 17:56 some.host.com-dir JobId 925: Using Device "olartek-FileStorage"
11-Apr 17:56 some.host.com JobId 925: Volume "olartek-0026" previously written,
moving to end of data.
11-Apr 17:56 some.host.com JobId 925: Error: Bacula cannot write on disk Volume
"olartek-0026" because: The sizes do not match! Volume=22855298947
Catalog=22852718467
11-Apr 17:56 some.host.com JobId 925: Marking Volume "olartek-0026" in Error in
Catalog.
11-Apr 17:56 some.host.com-dir JobId 925: Created new Volume "olartek-0029" in
catalog.
11-Apr 17:56 some.host.com JobId 925: Labeled new Volume "olartek-0029" on
device "olartek-FileStorage" (/var/lib/bacula/storage/).
11-Apr 17:56 some.host.com JobId 925: Wrote label to prelabeled Volume
"olartek-0029" on device "olartek-FileStorage" (/var/lib/bacula/storage/)
11-Apr 18:33 some.host.com JobId 925: Fatal error: append.c:242 Network error
reading from FD. ERR=No data available
11-Apr 18:33 some.host.com-dir JobId 925: Fatal error: Network error with FD
during Backup: ERR=No data available
11-Apr 18:33 some.host.com JobId 925: Job write elapsed time = 00:36:35,
Transfer rate = 36.60 K Bytes/second
11-Apr 18:33 some.host.com-dir JobId 925: Fatal error: No Job status returned
from FD.
11-Apr 18:33 some.host.com-dir JobId 925: Error: Bacula some.host.com-dir 5.0.3
(04Aug10): 11-Apr-2011 18:33:04
 Build OS:               x86_64-redhat-linux-gnu redhat
 JobId:                  925
 Job:                    donkey_FS.2011-04-11_17.56.23_05
 Backup Level:           Full
 Client:                 "some.host.com-fd" 5.0.3 (04Aug10)
x86_64-redhat-linux-gnu,redhat,
 FileSet:                "donkey FileSet" 2011-01-29 02:05:00
 Pool:                   "olartek-polka Pool" (From Job resource)
 Catalog:                "PWS" (From Client resource)
 Storage:                "olartek-polka" (From Pool resource)
 Scheduled time:         11-Apr-2011 17:56:19
 Start time:             11-Apr-2011 17:56:26
 End time:               11-Apr-2011 18:33:04
 Elapsed time:           36 mins 38 secs
 Priority:               10
 FD Files Written:       0
 SD Files Written:       0
 FD Bytes Written:       0 (0 B)
 SD Bytes Written:       0 (0 B)
 Rate:                   0.0 KB/s
 Software Compression:   None
 VSS:                    no
 Encryption:             no
 Accurate:               yes
 Volume name(s):         olartek-0029
 Volume Session Id:      54
 Volume Session Time:    1302262981
 Last Volume Bytes:      215 (215 B)
 Non-fatal FD errors:    0
 SD Errors:              0
 FD termination status:  Error
 SD termination status:  Running
 Termination:            *** Backup Error *** 

---------------------------------------------------------------------- 
 (0005836) ebollengier (administrator) - 2011-04-11 15:24
 http://bugs.bacula.org/view.php?id=1612#c5836 
---------------------------------------------------------------------- 
How many hardlinks do you have on your system? It's hard to believe that a
single strdup of 10 checksums can cause a out of memory. If you have millions of
them, we will have to find an other solution. It would be nice to test with a
small set of files. My tests didn't show any orphan buffers.

I can provide you a patch for the current 5.1 release. 

---------------------------------------------------------------------- 
 (0005837) bart613 (reporter) - 2011-04-12 07:50
 http://bugs.bacula.org/view.php?id=1612#c5837 
---------------------------------------------------------------------- 
There is probably around 8.000.  Most of the usual hardlinks that are in pretty
much every single CentOS 5 installation these days.  It rather seems like
something is dead-looping somewhere at least it uses all the memory.

If you provide patch for 5.1 release I will upgrade and double-check, perhaps it
will nicely resolve itself there. 

---------------------------------------------------------------------- 
 (0005838) ebollengier (administrator) - 2011-04-12 12:59
 http://bugs.bacula.org/view.php?id=1612#c5838 
---------------------------------------------------------------------- 
I know why my patch doesn't work, the digest string is a binary, so using
strdup() is not a good idea with it. I have to figure how to change that. 

---------------------------------------------------------------------- 
 (0005839) ebollengier (administrator) - 2011-04-12 15:46
 http://bugs.bacula.org/view.php?id=1612#c5839 
---------------------------------------------------------------------- 
The new version of the patch should work as expected now. 

---------------------------------------------------------------------- 
 (0005840) bart613 (reporter) - 2011-04-12 17:53
 http://bugs.bacula.org/view.php?id=1612#c5840 
---------------------------------------------------------------------- 
Is this one for 5.0.3 or for 5.1?  It does not seem to apply to 5.0.3 cleanly...


---------------------------------------------------------------------- 
 (0005841) ebollengier (administrator) - 2011-04-12 18:32
 http://bugs.bacula.org/view.php?id=1612#c5841 
---------------------------------------------------------------------- 
This patch is for 5.1. 

---------------------------------------------------------------------- 
 (0005842) ebollengier (administrator) - 2011-04-13 06:15
 http://bugs.bacula.org/view.php?id=1612#c5842 
---------------------------------------------------------------------- 
I found a small mistake in the previous patch, this one should be the latest.
Thanks for your testing. 

---------------------------------------------------------------------- 
 (0005846) ebollengier (administrator) - 2011-04-15 05:16
 http://bugs.bacula.org/view.php?id=1612#c5846 
---------------------------------------------------------------------- 
Fixed in the current dev branch. 

Issue History 
Date Modified    Username       Field                    Change               
====================================================================== 
2010-07-20 11:06 akoch          New Issue                                    
2010-07-20 13:04 mnalis         Issue Monitored: mnalis                      
2010-08-31 13:24 ebollengier    Note Added: 0005582                          
2010-08-31 13:24 ebollengier    Status                   new => acknowledged 
2011-02-13 10:45 bart613        Issue Monitored: bart613                     
2011-02-19 14:22 ebollengier    File Added:
0001-Fix-1612-about-checksum-for-hardlinks.patch                    
2011-02-19 14:23 ebollengier    Note Added: 0005762                          
2011-02-19 14:23 ebollengier    Assigned To               => ebollengier     
2011-02-19 14:23 ebollengier    Status                   acknowledged =>
feedback
2011-02-20 10:15 bart613        Note Added: 0005764                          
2011-02-24 19:50 ebollengier    Note Added: 0005773                          
2011-04-11 10:08 bart613        Note Added: 0005831                          
2011-04-11 13:01 ebollengier    Note Added: 0005832                          
2011-04-11 14:15 bart613        Note Added: 0005833                          
2011-04-11 14:18 bart613        Note Added: 0005834                          
2011-04-11 14:18 bart613        Note Deleted: 0005834                        
2011-04-11 14:19 bart613        Note Added: 0005835                          
2011-04-11 15:25 ebollengier    Note Added: 0005836                          
2011-04-11 15:26 ebollengier    File Deleted:
0001-Fix-1612-about-checksum-for-hardlinks.patch                    
2011-04-11 15:26 ebollengier    File Added:
0001-Fix-1612-about-checksum-for-hardlinks.patch                    
2011-04-12 07:50 bart613        Note Added: 0005837                          
2011-04-12 12:59 ebollengier    Note Added: 0005838                          
2011-04-12 15:45 ebollengier    File Deleted:
0001-Fix-1612-about-checksum-for-hardlinks.patch                    
2011-04-12 15:46 ebollengier    File Added:
0001-Fix-1612-about-checksum-for-hardlinks.patch                    
2011-04-12 15:46 ebollengier    Note Added: 0005839                          
2011-04-12 17:53 bart613        Note Added: 0005840                          
2011-04-12 18:32 ebollengier    Note Added: 0005841                          
2011-04-13 06:07 ebollengier    File Deleted:
0001-Fix-1612-about-checksum-for-hardlinks.patch                    
2011-04-13 06:14 ebollengier    File Added:
0001-Fix-1612-about-checksum-for-hardlinks-new.patch                    
2011-04-13 06:15 ebollengier    Note Added: 0005842                          
2011-04-15 05:16 ebollengier    Note Added: 0005846                          
2011-04-15 05:16 ebollengier    Status                   feedback => closed  
======================================================================


------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
Bacula-bugs mailing list
Bacula-bugs@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-bugs
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic