[prev in list] [next in list] [prev in thread] [next in thread] 

List:       varnish-misc
Subject:    Re: Child process recurrently being restarted
From:       Guillaume Quintard <guillaume () varnish-software ! com>
Date:       2017-07-28 8:49:32
Message-ID: CAJ6ZYQy1VjNmL3r-OhUJtoUhQyKn069tCTXG4NPHRrKcBsrtyQ () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


Anything else using RAM on that machine? Best idea I have now is monitor
Varnish memory usage in parallel of the *.g_bytes counters.

-- 
Guillaume Quintard

On Thu, Jun 29, 2017 at 7:09 PM, Stefano Baldo <stefanobaldo@gmail.com>
wrote:

> Hi Guillaume and Reza.
>
> This time varnish restarted but it left some more info on syslog.
> It seems like the system is running out of memory.
>
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.297487] pool_herder invoked
> oom-killer: gfp_mask=0x2000d0, order=2, oom_score_adj=0
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.300992] pool_herder cpuset=/
> mems_allowed=0
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.303157] CPU: 1 PID: 16214
> Comm: pool_herder Tainted: G         C O  3.16.0-4-amd64 #1 Debian
> 3.16.36-1+deb8u2
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984] Hardware name: Xen
> HVM domU, BIOS 4.2.amazon 02/16/2017
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  0000000000000000
> ffffffff815123b5 ffff8800eb3652f0 0000000000000000
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  ffffffff8150ff8d
> 0000000000000000 ffffffff810d6e3f 0000000000000000
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  ffffffff81516d2e
> 0000000000000200 ffffffff810689d3 ffffffff810c43e4
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984] Call Trace:
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  [<ffffffff815123b5>]
> ? dump_stack+0x5d/0x78
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  [<ffffffff8150ff8d>]
> ? dump_header+0x76/0x1e8
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  [<ffffffff810d6e3f>]
> ? smp_call_function_single+0x5f/0xa0
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  [<ffffffff81516d2e>]
> ? mutex_lock+0xe/0x2a
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  [<ffffffff810689d3>]
> ? put_online_cpus+0x23/0x80
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  [<ffffffff810c43e4>]
> ? rcu_oom_notify+0xc4/0xe0
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  [<ffffffff81153d1c>]
> ? do_try_to_free_pages+0x4ac/0x520
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  [<ffffffff811427dd>]
> ? oom_kill_process+0x21d/0x370
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  [<ffffffff8114239d>]
> ? find_lock_task_mm+0x3d/0x90
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  [<ffffffff81142f43>]
> ? out_of_memory+0x473/0x4b0
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  [<ffffffff81148e0f>]
> ? __alloc_pages_nodemask+0x9ef/0xb50
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  [<ffffffff81065c86>]
> ? copy_process.part.25+0x116/0x1c50
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  [<ffffffff81058301>]
> ? __do_page_fault+0x1d1/0x4f0
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  [<ffffffff81067990>]
> ? do_fork+0xe0/0x3d0
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  [<ffffffff815188f9>]
> ? stub_clone+0x69/0x90
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]  [<ffffffff8151858d>]
> ? system_call_fast_compare_end+0x10/0x15
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.367638] Mem-Info:
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.368962] Node 0 DMA per-cpu:
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.370768] CPU    0: hi:    0,
> btch:   1 usd:   0
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.373249] CPU    1: hi:    0,
> btch:   1 usd:   0
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.375652] Node 0 DMA32 per-cpu:
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.377508] CPU    0: hi:  186,
> btch:  31 usd:  29
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.379898] CPU    1: hi:  186,
> btch:  31 usd:   0
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.382318] active_anon:846474
> inactive_anon:1913 isolated_anon:0
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.382318]  active_file:408
> inactive_file:415 isolated_file:32
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.382318]  unevictable:20736
> dirty:27 writeback:0 unstable:0
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.382318]  free:16797
> slab_reclaimable:15276 slab_unreclaimable:10521
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.382318]  mapped:22002
> shmem:22935 pagetables:30362 bounce:0
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.382318]  free_cma:0
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.397242] Node 0 DMA
> free:15192kB min:184kB low:228kB high:276kB active_anon:416kB
> inactive_anon:60kB active_file:0kB inactive_file:0kB unevictable:20kB
> isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB
> mlocked:20kB dirty:0kB writeback:0kB mapped:20kB shmem:80kB
> slab_reclaimable:32kB slab_unreclaimable:0kB kernel_stack:112kB
> pagetables:20kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB
> pages_scanned:0 all_unreclaimable? yes
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.416338] lowmem_reserve[]: 0
> 3757 3757 3757
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.419030] Node 0 DMA32
> free:50120kB min:44868kB low:56084kB high:67300kB active_anon:3386780kB
> inactive_anon:7592kB active_file:1732kB inactive_file:2060kB
> unevictable:82924kB isolated(anon):0kB isolated(file):128kB
> present:3915776kB managed:3849676kB mlocked:82924kB dirty:108kB
> writeback:0kB mapped:88432kB shmem:91660kB slab_reclaimable:61072kB
> slab_unreclaimable:42184kB kernel_stack:27248kB pagetables:121428kB
> unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? no
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.440095] lowmem_reserve[]: 0 0
> 0 0
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.442202] Node 0 DMA: 22*4kB
> (UEM) 6*8kB (EM) 1*16kB (E) 2*32kB (UM) 2*64kB (UE) 2*128kB (EM) 3*256kB
> (UEM) 1*512kB (E) 3*1024kB (UEM) 3*2048kB (EMR) 1*4096kB (M) = 15192kB
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.451936] Node 0 DMA32:
> 4031*4kB (EM) 2729*8kB (EM) 324*16kB (EM) 1*32kB (R) 1*64kB (R) 0*128kB
> 0*256kB 1*512kB (R) 1*1024kB (R) 1*2048kB (R) 0*4096kB = 46820kB
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.460240] Node 0
> hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.464122] 24240 total pagecache
> pages
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.466048] 0 pages in swap cache
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.467672] Swap cache stats: add
> 0, delete 0, find 0/0
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.470159] Free swap  = 0kB
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.471513] Total swap = 0kB
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.472980] 982941 pages RAM
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.474380] 0 pages
> HighMem/MovableOnly
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.476190] 16525 pages reserved
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.477772] 0 pages hwpoisoned
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.479189] [ pid ]   uid  tgid
> total_vm      rss nr_ptes swapents oom_score_adj name
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.482698] [  163]     0   163
>  10419     1295      21        0             0 systemd-journal
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.486646] [  165]     0   165
>  10202      136      21        0         -1000 systemd-udevd
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.490598] [  294]     0   294
>   6351     1729      14        0             0 dhclient
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.494457] [  319]     0   319
>   6869       62      18        0             0 cron
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.498260] [  321]     0   321
>   4964       67      14        0             0 systemd-logind
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.502346] [  326]   105   326
>  10558      101      25        0          -900 dbus-daemon
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.506315] [  342]     0   342
>  65721      228      31        0             0 rsyslogd
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.510222] [  343]     0   343
>  88199     2108      61        0          -500 dockerd
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.514022] [  350]   106   350
>  18280      181      36        0             0 zabbix_agentd
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.518040] [  351]   106   351
>  18280      475      36        0             0 zabbix_agentd
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.522041] [  352]   106   352
>  18280      187      36        0             0 zabbix_agentd
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.526025] [  353]   106   353
>  18280      187      36        0             0 zabbix_agentd
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.530067] [  354]   106   354
>  18280      187      36        0             0 zabbix_agentd
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.534033] [  355]   106   355
>  18280      190      36        0             0 zabbix_agentd
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.538001] [  358]     0   358
>  66390     1826      32        0             0 fail2ban-server
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.541972] [  400]     0   400
>  35984      444      24        0          -500 docker-containe
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.545879] [  568]     0   568
>  13796      168      30        0         -1000 sshd
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.549733] [  576]     0   576
>   3604       41      12        0             0 agetty
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.553569] [  577]     0   577
>   3559       38      12        0             0 agetty
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.557322] [16201]     0 16201
>  29695    20707      60        0             0 varnishd
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.561103] [16209]   108 16209
> 118909802   822425   29398        0             0 cache-main
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.565002] [27352]     0 27352
>  20131      214      42        0             0 sshd
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.568682] [27354]  1000 27354
>  20165      211      41        0             0 sshd
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.572307] [27355]  1000 27355
>   5487      146      17        0             0 bash
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.575920] [27360]     0 27360
>  11211      107      26        0             0 sudo
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.579593] [27361]     0 27361
>  11584       97      27        0             0 su
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.583155] [27362]     0 27362
>   5481      142      15        0             0 bash
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.586782] [27749]     0 27749
>  20131      214      41        0             0 sshd
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.590428] [27751]  1000 27751
>  20164      211      39        0             0 sshd
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.593979] [27752]  1000 27752
>   5487      147      15        0             0 bash
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.597488] [28762]     0 28762
>  26528      132      17        0             0 varnishstat
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.601239] [28764]     0 28764
>  11211      106      26        0             0 sudo
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.604737] [28765]     0 28765
>  11584       97      26        0             0 su
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.608602] [28766]     0 28766
>   5481      141      15        0             0 bash
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.612288] [28768]     0 28768
>  26528      220      18        0             0 varnishstat
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.616189] Out of memory: Kill
> process 16209 (cache-main) score 880 or sacrifice child
> Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.620106] Killed process 16209
> (cache-main) total-vm:475639208kB, anon-rss:3289700kB, file-rss:0kB
> Jun 29 13:11:01 ip-172-25-2-8 varnishd[16201]: Child (16209) died signal=9
> Jun 29 13:11:01 ip-172-25-2-8 varnishd[16201]: Child cleanup complete
> Jun 29 13:11:01 ip-172-25-2-8 varnishd[16201]: Child (30313) Started
> Jun 29 13:11:01 ip-172-25-2-8 varnishd[16201]: Child (30313) said Child
> starts
> Jun 29 13:11:01 ip-172-25-2-8 varnishd[16201]: Child (30313) said SMF.s0
> mmap'ed 483183820800 bytes of 483183820800
>
> Best,
> Stefano
>
>
> On Wed, Jun 28, 2017 at 11:33 AM, Reza Naghibi <reza@varnish-software.com>
> wrote:
>
>> Assuming the problem is running out of memory, you will need to do some
>> memory tuning, especially given the number of threads you are using and
>> your access patterns. Your options:
>>
>>    - Add more memory to the system
>>    - Reduce thread_pool_max
>>    - Reduce jemalloc's thread cache (MALLOC_CONF="lg_tcache_max:10")
>>    - Use some of the tuning params in here: https://info.varnish-sof
>>    tware.com/blog/understanding-varnish-cache-memory-usage
>>    <https://info.varnish-software.com/blog/understanding-varnish-cache-memory-usage>
>>
>>
>> --
>> Reza Naghibi
>> Varnish Software
>>
>> On Wed, Jun 28, 2017 at 9:26 AM, Guillaume Quintard <
>> guillaume@varnish-software.com> wrote:
>>
>>> Hi,
>>>
>>> can you look that "varnishstat -1 | grep g_bytes" and see if if matches
>>> the memory you are seeing?
>>>
>>> --
>>> Guillaume Quintard
>>>
>>> On Wed, Jun 28, 2017 at 3:20 PM, Stefano Baldo <stefanobaldo@gmail.com>
>>> wrote:
>>>
>>>> Hi Guillaume.
>>>>
>>>> I increased the cli_timeout yesterday to 900sec (15min) and it
>>>> restarted anyway, which seems to indicate that the thread is really stalled.
>>>>
>>>> This was 1 minute after the last restart:
>>>>
>>>> MAIN.n_object               3908216          .   object structs made
>>>> SMF.s0.g_alloc              7794510          .   Allocations outstanding
>>>>
>>>> I've just changed the I/O Scheduler to noop to see what happens.
>>>>
>>>> One interest thing I've found is about the memory usage.
>>>>
>>>> In the 1st minute of use:
>>>> MemTotal:        3865572 kB
>>>> MemFree:          120768 kB
>>>> MemAvailable:    2300268 kB
>>>>
>>>> 1 minute before a restart:
>>>> MemTotal:        3865572 kB
>>>> MemFree:           82480 kB
>>>> MemAvailable:      68316 kB
>>>>
>>>> It seems like the system is possibly running out of memory.
>>>>
>>>> When calling varnishd, I'm specifying only "-s file,..." as storage. I
>>>> see in some examples that is common to use "-s file" AND "-s malloc"
>>>> together. Should I be passing "-s malloc" as well to somehow try to limit
>>>> the memory usage by varnishd?
>>>>
>>>> Best,
>>>> Stefano
>>>>
>>>>
>>>> On Wed, Jun 28, 2017 at 4:12 AM, Guillaume Quintard <
>>>> guillaume@varnish-software.com> wrote:
>>>>
>>>>> Sadly, nothing suspicious here, you can still try:
>>>>> - bumping the cli_timeout
>>>>> - changing your disk scheduler
>>>>> - changing the advice option of the file storage
>>>>>
>>>>> I'm still convinced this is due to Varnish getting stuck waiting for
>>>>> the disk because of the file storage fragmentation.
>>>>>
>>>>> Maybe you could look at SMF.*.g_alloc and compare it to the number of
>>>>> objects. Ideally, we would have a 1:1 relation between objects and
>>>>> allocations. If that number drops prior to a restart, that would be a good
>>>>> clue.
>>>>>
>>>>>
>>>>> --
>>>>> Guillaume Quintard
>>>>>
>>>>> On Tue, Jun 27, 2017 at 11:07 PM, Stefano Baldo <
>>>>> stefanobaldo@gmail.com> wrote:
>>>>>
>>>>>> Hi Guillaume.
>>>>>>
>>>>>> It keeps restarting.
>>>>>> Would you mind taking a quick look in the following VCL file to check
>>>>>> if you find anything suspicious?
>>>>>>
>>>>>> Thank you very much.
>>>>>>
>>>>>> Best,
>>>>>> Stefano
>>>>>>
>>>>>> vcl 4.0;
>>>>>>
>>>>>> import std;
>>>>>>
>>>>>> backend default {
>>>>>>   .host = "sites-web-server-lb";
>>>>>>   .port = "80";
>>>>>> }
>>>>>>
>>>>>> include "/etc/varnish/bad_bot_detection.vcl";
>>>>>>
>>>>>> sub vcl_recv {
>>>>>>   call bad_bot_detection;
>>>>>>
>>>>>>   if (req.url == "/nocache" || req.url == "/version") {
>>>>>>     return(pass);
>>>>>>   }
>>>>>>
>>>>>>   unset req.http.Cookie;
>>>>>>   if (req.method == "PURGE") {
>>>>>>     ban("obj.http.x-host == " + req.http.host + " &&
>>>>>> obj.http.x-user-agent !~ Googlebot");
>>>>>>     return(synth(750));
>>>>>>   }
>>>>>>
>>>>>>   set req.url = regsuball(req.url, "(?<!(http:|https))\/+", "/");
>>>>>> }
>>>>>>
>>>>>> sub vcl_synth {
>>>>>>   if (resp.status == 750) {
>>>>>>     set resp.status = 200;
>>>>>>     synthetic("PURGED => " + req.url);
>>>>>>     return(deliver);
>>>>>>   } elsif (resp.status == 501) {
>>>>>>     set resp.status = 200;
>>>>>>     set resp.http.Content-Type = "text/html; charset=utf-8";
>>>>>>     synthetic(std.fileread("/etc/varnish/pages/invalid_domain.ht
>>>>>> ml"));
>>>>>>     return(deliver);
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>> sub vcl_backend_response {
>>>>>>   unset beresp.http.Set-Cookie;
>>>>>>   set beresp.http.x-host = bereq.http.host;
>>>>>>   set beresp.http.x-user-agent = bereq.http.user-agent;
>>>>>>
>>>>>>   if (bereq.url == "/themes/basic/assets/theme.min.css"
>>>>>>     || bereq.url == "/api/events/PAGEVIEW"
>>>>>>     || bereq.url ~ "^\/assets\/img\/") {
>>>>>>     set beresp.http.Cache-Control = "max-age=0";
>>>>>>   } else {
>>>>>>     unset beresp.http.Cache-Control;
>>>>>>   }
>>>>>>
>>>>>>   if (beresp.status == 200 ||
>>>>>>     beresp.status == 301 ||
>>>>>>     beresp.status == 302 ||
>>>>>>     beresp.status == 404) {
>>>>>>       if (bereq.url ~ "\&ordenar=aleatorio$") {
>>>>>>         set beresp.http.X-TTL = "1d";
>>>>>>         set beresp.ttl = 1d;
>>>>>>       } else {
>>>>>>         set beresp.http.X-TTL = "1w";
>>>>>>         set beresp.ttl = 1w;
>>>>>>       }
>>>>>>   }
>>>>>>
>>>>>>   if (bereq.url !~ "\.(jpeg|jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|swf|flv)$")
>>>>>> {
>>>>>>     set beresp.do_gzip = true;
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>> sub vcl_pipe {
>>>>>>   set bereq.http.connection = "close";
>>>>>>   return (pipe);
>>>>>> }
>>>>>>
>>>>>> sub vcl_deliver {
>>>>>>   unset resp.http.x-host;
>>>>>>   unset resp.http.x-user-agent;
>>>>>> }
>>>>>>
>>>>>> sub vcl_backend_error {
>>>>>>   if (beresp.status == 502 || beresp.status == 503 || beresp.status
>>>>>> == 504) {
>>>>>>     set beresp.status = 200;
>>>>>>     set beresp.http.Content-Type = "text/html; charset=utf-8";
>>>>>>     synthetic(std.fileread("/etc/varnish/pages/maintenance.html"));
>>>>>>     return (deliver);
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>> sub vcl_hash {
>>>>>>   if (req.http.User-Agent ~ "Google Page Speed") {
>>>>>>     hash_data("Google Page Speed");
>>>>>>   } elsif (req.http.User-Agent ~ "Googlebot") {
>>>>>>     hash_data("Googlebot");
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>> sub vcl_deliver {
>>>>>>   if (resp.status == 501) {
>>>>>>     return (synth(resp.status));
>>>>>>   }
>>>>>>   if (obj.hits > 0) {
>>>>>>     set resp.http.X-Cache = "hit";
>>>>>>   } else {
>>>>>>     set resp.http.X-Cache = "miss";
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>>
>>>>>> On Mon, Jun 26, 2017 at 3:47 PM, Guillaume Quintard <
>>>>>> guillaume@varnish-software.com> wrote:
>>>>>>
>>>>>>> Nice! It may have been the cause, time will tell.can you report back
>>>>>>> in a few days to let us know?
>>>>>>> --
>>>>>>> Guillaume Quintard
>>>>>>>
>>>>>>> On Jun 26, 2017 20:21, "Stefano Baldo" <stefanobaldo@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Guillaume.
>>>>>>>>
>>>>>>>> I think things will start to going better now after changing the
>>>>>>>> bans.
>>>>>>>> This is how my last varnishstat looked like moments before a crash
>>>>>>>> regarding the bans:
>>>>>>>>
>>>>>>>> MAIN.bans                     41336          .   Count of bans
>>>>>>>> MAIN.bans_completed           37967          .   Number of bans
>>>>>>>> marked 'completed'
>>>>>>>> MAIN.bans_obj                     0          .   Number of bans
>>>>>>>> using obj.*
>>>>>>>> MAIN.bans_req                 41335          .   Number of bans
>>>>>>>> using req.*
>>>>>>>> MAIN.bans_added               41336         0.68 Bans added
>>>>>>>> MAIN.bans_deleted                 0         0.00 Bans deleted
>>>>>>>>
>>>>>>>> And this is how it looks like now:
>>>>>>>>
>>>>>>>> MAIN.bans                         2          .   Count of bans
>>>>>>>> MAIN.bans_completed               1          .   Number of bans
>>>>>>>> marked 'completed'
>>>>>>>> MAIN.bans_obj                     2          .   Number of bans
>>>>>>>> using obj.*
>>>>>>>> MAIN.bans_req                     0          .   Number of bans
>>>>>>>> using req.*
>>>>>>>> MAIN.bans_added                2016         0.69 Bans added
>>>>>>>> MAIN.bans_deleted              2014         0.69 Bans deleted
>>>>>>>>
>>>>>>>> Before the changes, bans were never deleted!
>>>>>>>> Now the bans are added and quickly deleted after a minute or even a
>>>>>>>> couple of seconds.
>>>>>>>>
>>>>>>>> May this was the cause of the problem? It seems like varnish was
>>>>>>>> having a large number of bans to manage and test against.
>>>>>>>> I will let it ride now. Let's see if the problem persists or it's
>>>>>>>> gone! :-)
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Stefano
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jun 26, 2017 at 3:10 PM, Guillaume Quintard <
>>>>>>>> guillaume@varnish-software.com> wrote:
>>>>>>>>
>>>>>>>>> Looking good!
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Guillaume Quintard
>>>>>>>>>
>>>>>>>>> On Mon, Jun 26, 2017 at 7:06 PM, Stefano Baldo <
>>>>>>>>> stefanobaldo@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Guillaume,
>>>>>>>>>>
>>>>>>>>>> Can the following be considered "ban lurker friendly"?
>>>>>>>>>>
>>>>>>>>>> sub vcl_backend_response {
>>>>>>>>>>   set beresp.http.x-url = bereq.http.host + bereq.url;
>>>>>>>>>>   set beresp.http.x-user-agent = bereq.http.user-agent;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> sub vcl_recv {
>>>>>>>>>>   if (req.method == "PURGE") {
>>>>>>>>>>     ban("obj.http.x-url == " + req.http.host + req.url + " &&
>>>>>>>>>> obj.http.x-user-agent !~ Googlebot");
>>>>>>>>>>     return(synth(750));
>>>>>>>>>>   }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> sub vcl_deliver {
>>>>>>>>>>   unset resp.http.x-url;
>>>>>>>>>>   unset resp.http.x-user-agent;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Stefano
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Jun 26, 2017 at 12:43 PM, Guillaume Quintard <
>>>>>>>>>> guillaume@varnish-software.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Not lurker friendly at all indeed. You'll need to avoid req.*
>>>>>>>>>>> expression. Easiest way is to stash the host, user-agent and url in
>>>>>>>>>>> beresp.http.* and ban against those (unset them in vcl_deliver).
>>>>>>>>>>>
>>>>>>>>>>> I don't think you need to expand the VSL at all.
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Guillaume Quintard
>>>>>>>>>>>
>>>>>>>>>>> On Jun 26, 2017 16:51, "Stefano Baldo" <stefanobaldo@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Guillaume.
>>>>>>>>>>>
>>>>>>>>>>> Thanks for answering.
>>>>>>>>>>>
>>>>>>>>>>> I'm using a SSD disk. I've changed from ext4 to ext2 to increase
>>>>>>>>>>> performance but it stills restarting.
>>>>>>>>>>> Also, I checked the I/O performance for the disk and there is no
>>>>>>>>>>> signal of overhead.
>>>>>>>>>>>
>>>>>>>>>>> I've changed the /var/lib/varnish to a tmpfs and increased its
>>>>>>>>>>> 80m default size passing "-l 200m,20m" to varnishd and using
>>>>>>>>>>> "nodev,nosuid,noatime,size=256M 0 0" for the tmpfs mount. There
>>>>>>>>>>> was a problem here. After a couple of hours varnish died and I received a
>>>>>>>>>>> "no space left on device" message - deleting the /var/lib/varnish solved
>>>>>>>>>>> the problem and varnish was up again, but it's weird because there was free
>>>>>>>>>>> memory on the host to be used with the tmpfs directory, so I don't know
>>>>>>>>>>> what could have happened. I will try to stop increasing the
>>>>>>>>>>> /var/lib/varnish size.
>>>>>>>>>>>
>>>>>>>>>>> Anyway, I am worried about the bans. You asked me if the bans
>>>>>>>>>>> are lurker friedly. Well, I don't think so. My bans are created this way:
>>>>>>>>>>>
>>>>>>>>>>> ban("req.http.host == " + req.http.host + " && req.url ~ " +
>>>>>>>>>>> req.url + " && req.http.User-Agent !~ Googlebot");
>>>>>>>>>>>
>>>>>>>>>>> Are they lurker friendly? I was taking a quick look and the
>>>>>>>>>>> documentation and it looks like they're not.
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Stefano
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jun 23, 2017 at 11:30 AM, Guillaume Quintard <
>>>>>>>>>>> guillaume@varnish-software.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Stefano,
>>>>>>>>>>>>
>>>>>>>>>>>> Let's cover the usual suspects: I/Os. I think here Varnish gets
>>>>>>>>>>>> stuck trying to push/pull data and can't make time to reply to the CLI. I'd
>>>>>>>>>>>> recommend monitoring the disk activity (bandwidth and iops) to confirm.
>>>>>>>>>>>>
>>>>>>>>>>>> After some time, the file storage is terrible on a hard drive
>>>>>>>>>>>> (SSDs take a bit more time to degrade) because of fragmentation. One
>>>>>>>>>>>> solution to help the disks cope is to overprovision themif they're SSDs,
>>>>>>>>>>>> and you can try different advices in the file storage definition in the
>>>>>>>>>>>> command line (last parameter, after granularity).
>>>>>>>>>>>>
>>>>>>>>>>>> Is your /var/lib/varnish mount on tmpfs? That could help too.
>>>>>>>>>>>>
>>>>>>>>>>>> 40K bans is a lot, are they ban-lurker friendly?
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Guillaume Quintard
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jun 23, 2017 at 4:01 PM, Stefano Baldo <
>>>>>>>>>>>> stefanobaldo@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hello.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am having a critical problem with Varnish Cache in
>>>>>>>>>>>>> production for over a month and any help will be appreciated.
>>>>>>>>>>>>> The problem is that Varnish child process is recurrently being
>>>>>>>>>>>>> restarted after 10~20h of use, with the following message:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824)
>>>>>>>>>>>>> not responding to CLI, killed it.
>>>>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Unexpected reply
>>>>>>>>>>>>> from ping: 400 CLI communication error
>>>>>>>>>>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824)
>>>>>>>>>>>>> died signal=9
>>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child cleanup
>>>>>>>>>>>>> complete
>>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038)
>>>>>>>>>>>>> Started
>>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038)
>>>>>>>>>>>>> said Child starts
>>>>>>>>>>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038)
>>>>>>>>>>>>> said SMF.s0 mmap'ed 483183820800 bytes of 483183820800
>>>>>>>>>>>>>
>>>>>>>>>>>>> The following link is the varnishstat output just 1 minute
>>>>>>>>>>>>> before a restart:
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://pastebin.com/g0g5RVTs
>>>>>>>>>>>>>
>>>>>>>>>>>>> Environment:
>>>>>>>>>>>>>
>>>>>>>>>>>>> varnish-5.1.2 revision 6ece695
>>>>>>>>>>>>> Debian 8.7 - Debian GNU/Linux 8 (3.16.0)
>>>>>>>>>>>>> Installed using pre-built package from official repo at
>>>>>>>>>>>>> packagecloud.io
>>>>>>>>>>>>> CPU 2x2.9 GHz
>>>>>>>>>>>>> Mem 3.69 GiB
>>>>>>>>>>>>> Running inside a Docker container
>>>>>>>>>>>>> NFILES=131072
>>>>>>>>>>>>> MEMLOCK=82000
>>>>>>>>>>>>>
>>>>>>>>>>>>> Additional info:
>>>>>>>>>>>>>
>>>>>>>>>>>>> - I need to cache a large number of objets and the cache
>>>>>>>>>>>>> should last for almost a week, so I have set up a 450G storage space, I
>>>>>>>>>>>>> don't know if this is a problem;
>>>>>>>>>>>>> - I use ban a lot. There was about 40k bans in the system just
>>>>>>>>>>>>> before the last crash. I really don't know if this is too much or may have
>>>>>>>>>>>>> anything to do with it;
>>>>>>>>>>>>> - No registered CPU spikes (almost always by 30%);
>>>>>>>>>>>>> - No panic is reported, the only info I can retrieve is from
>>>>>>>>>>>>> syslog;
>>>>>>>>>>>>> - During all the time, event moments before the crashes,
>>>>>>>>>>>>> everything is okay and requests are being responded very fast.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Stefano Baldo
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> varnish-misc mailing list
>>>>>>>>>>>>> varnish-misc@varnish-cache.org
>>>>>>>>>>>>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish
>>>>>>>>>>>>> -misc
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> varnish-misc mailing list
>>> varnish-misc@varnish-cache.org
>>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>>>
>>
>>
>

[Attachment #5 (text/html)]

<div dir="ltr">Anything else using RAM on that machine? Best idea I have now is \
monitor Varnish memory usage in parallel of the *.g_bytes counters.</div><div \
class="gmail_extra"><br clear="all"><div><div class="gmail_signature" \
data-smartmail="gmail_signature"><div dir="ltr"><div>-- <br></div>Guillaume \
Quintard<br></div></div></div> <br><div class="gmail_quote">On Thu, Jun 29, 2017 at \
7:09 PM, Stefano Baldo <span dir="ltr">&lt;<a href="mailto:stefanobaldo@gmail.com" \
target="_blank">stefanobaldo@gmail.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div dir="ltr">Hi Guillaume and Reza.<div><br></div><div>This \
time varnish restarted but it left some more info on syslog.</div><div>It seems like \
the system is running out of memory.</div><div><br></div><div><div>Jun 29 13:11:01 \
ip-172-25-2-8 kernel: [93823.297487] pool_herder invoked oom-killer: \
gfp_mask=0x2000d0, order=2, oom_score_adj=0</div><div>Jun 29 13:11:01 ip-172-25-2-8 \
kernel: [93823.300992] pool_herder cpuset=/ mems_allowed=0</div><div>Jun 29 13:11:01 \
ip-172-25-2-8 kernel: [93823.303157] CPU: 1 PID: 16214 Comm: pool_herder Tainted: G   \
C O   3.16.0-4-amd64 #1 Debian 3.16.36-1+deb8u2</div><div>Jun 29 13:11:01 \
ip-172-25-2-8 kernel: [93823.304984] Hardware name: Xen HVM domU, BIOS 4.2.amazon \
02/16/2017</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]   \
0000000000000000 ffffffff815123b5 ffff8800eb3652f0 0000000000000000</div><div>Jun 29 \
13:11:01 ip-172-25-2-8 kernel: [93823.304984]   ffffffff8150ff8d 0000000000000000 \
ffffffff810d6e3f 0000000000000000</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: \
[93823.304984]   ffffffff81516d2e 0000000000000200 ffffffff810689d3 \
ffffffff810c43e4</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984] Call \
Trace:</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]   \
[&lt;ffffffff815123b5&gt;] ? dump_stack+0x5d/0x78</div><div>Jun 29 13:11:01 \
ip-172-25-2-8 kernel: [93823.304984]   [&lt;ffffffff8150ff8d&gt;] ? \
dump_header+0x76/0x1e8</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984] \
[&lt;ffffffff810d6e3f&gt;] ? smp_call_function_single+0x5f/<wbr>0xa0</div><div>Jun 29 \
13:11:01 ip-172-25-2-8 kernel: [93823.304984]   [&lt;ffffffff81516d2e&gt;] ? \
mutex_lock+0xe/0x2a</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]   \
[&lt;ffffffff810689d3&gt;] ? put_online_cpus+0x23/0x80</div><div>Jun 29 13:11:01 \
ip-172-25-2-8 kernel: [93823.304984]   [&lt;ffffffff810c43e4&gt;] ? \
rcu_oom_notify+0xc4/0xe0</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: \
[93823.304984]   [&lt;ffffffff81153d1c&gt;] ? \
do_try_to_free_pages+0x4ac/<wbr>0x520</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: \
[93823.304984]   [&lt;ffffffff811427dd&gt;] ? \
oom_kill_process+0x21d/0x370</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: \
[93823.304984]   [&lt;ffffffff8114239d&gt;] ? \
find_lock_task_mm+0x3d/0x90</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: \
[93823.304984]   [&lt;ffffffff81142f43&gt;] ? out_of_memory+0x473/0x4b0</div><div>Jun \
29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]   [&lt;ffffffff81148e0f&gt;] ? \
__alloc_pages_nodemask+0x9ef/<wbr>0xb50</div><div>Jun 29 13:11:01 ip-172-25-2-8 \
kernel: [93823.304984]   [&lt;ffffffff81065c86&gt;] ? \
copy_process.part.25+0x116/<wbr>0x1c50</div><div>Jun 29 13:11:01 ip-172-25-2-8 \
kernel: [93823.304984]   [&lt;ffffffff81058301&gt;] ? \
__do_page_fault+0x1d1/0x4f0</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: \
[93823.304984]   [&lt;ffffffff81067990&gt;] ? do_fork+0xe0/0x3d0</div><div>Jun 29 \
13:11:01 ip-172-25-2-8 kernel: [93823.304984]   [&lt;ffffffff815188f9&gt;] ? \
stub_clone+0x69/0x90</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.304984]   \
[&lt;ffffffff8151858d&gt;] ? \
system_call_fast_compare_end+<wbr>0x10/0x15</div><div>Jun 29 13:11:01 ip-172-25-2-8 \
kernel: [93823.367638] Mem-Info:</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: \
[93823.368962] Node 0 DMA per-cpu:</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: \
[93823.370768] CPU      0: hi:      0, btch:    1 usd:    0</div><div>Jun 29 13:11:01 \
ip-172-25-2-8 kernel: [93823.373249] CPU      1: hi:      0, btch:    1 usd:    \
0</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.375652] Node 0 DMA32 \
per-cpu:</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.377508] CPU      0: \
hi:   186, btch:   31 usd:   29</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: \
[93823.379898] CPU      1: hi:   186, btch:   31 usd:    0</div><div>Jun 29 13:11:01 \
ip-172-25-2-8 kernel: [93823.382318] active_anon:846474 inactive_anon:1913 \
isolated_anon:0</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.382318]   \
active_file:408 inactive_file:415 isolated_file:32</div><div>Jun 29 13:11:01 \
ip-172-25-2-8 kernel: [93823.382318]   unevictable:20736 dirty:27 writeback:0 \
unstable:0</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.382318]   \
free:16797 slab_reclaimable:15276 slab_unreclaimable:10521</div><div>Jun 29 13:11:01 \
ip-172-25-2-8 kernel: [93823.382318]   mapped:22002 shmem:22935 pagetables:30362 \
bounce:0</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.382318]   \
free_cma:0</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.397242] Node 0 DMA \
free:15192kB min:184kB low:228kB high:276kB active_anon:416kB inactive_anon:60kB \
active_file:0kB inactive_file:0kB unevictable:20kB isolated(anon):0kB \
isolated(file):0kB present:15988kB managed:15904kB mlocked:20kB dirty:0kB \
writeback:0kB mapped:20kB shmem:80kB slab_reclaimable:32kB slab_unreclaimable:0kB \
kernel_stack:112kB pagetables:20kB unstable:0kB bounce:0kB free_cma:0kB \
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes</div><div>Jun 29 13:11:01 \
ip-172-25-2-8 kernel: [93823.416338] lowmem_reserve[]: 0 3757 3757 3757</div><div>Jun \
29 13:11:01 ip-172-25-2-8 kernel: [93823.419030] Node 0 DMA32 free:50120kB \
min:44868kB low:56084kB high:67300kB active_anon:3386780kB inactive_anon:7592kB \
active_file:1732kB inactive_file:2060kB unevictable:82924kB isolated(anon):0kB \
isolated(file):128kB present:3915776kB managed:3849676kB mlocked:82924kB dirty:108kB \
writeback:0kB mapped:88432kB shmem:91660kB slab_reclaimable:61072kB \
slab_unreclaimable:42184kB kernel_stack:27248kB pagetables:121428kB unstable:0kB \
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? \
no</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.440095] lowmem_reserve[]: 0 \
0 0 0</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.442202] Node 0 DMA: \
22*4kB (UEM) 6*8kB (EM) 1*16kB (E) 2*32kB (UM) 2*64kB (UE) 2*128kB (EM) 3*256kB (UEM) \
1*512kB (E) 3*1024kB (UEM) 3*2048kB (EMR) 1*4096kB (M) = 15192kB</div><div>Jun 29 \
13:11:01 ip-172-25-2-8 kernel: [93823.451936] Node 0 DMA32: 4031*4kB (EM) 2729*8kB \
(EM) 324*16kB (EM) 1*32kB (R) 1*64kB (R) 0*128kB 0*256kB 1*512kB (R) 1*1024kB (R) \
1*2048kB (R) 0*4096kB = 46820kB</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: \
[93823.460240] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 \
hugepages_size=2048kB</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.464122] \
24240 total pagecache pages</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: \
[93823.466048] 0 pages in swap cache</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: \
[93823.467672] Swap cache stats: add 0, delete 0, find 0/0</div><div>Jun 29 13:11:01 \
ip-172-25-2-8 kernel: [93823.470159] Free swap   = 0kB</div><div>Jun 29 13:11:01 \
ip-172-25-2-8 kernel: [93823.471513] Total swap = 0kB</div><div>Jun 29 13:11:01 \
ip-172-25-2-8 kernel: [93823.472980] 982941 pages RAM</div><div>Jun 29 13:11:01 \
ip-172-25-2-8 kernel: [93823.474380] 0 pages HighMem/MovableOnly</div><div>Jun 29 \
13:11:01 ip-172-25-2-8 kernel: [93823.476190] 16525 pages reserved</div><div>Jun 29 \
13:11:01 ip-172-25-2-8 kernel: [93823.477772] 0 pages hwpoisoned</div><div>Jun 29 \
13:11:01 ip-172-25-2-8 kernel: [93823.479189] [ pid ]    uid   tgid total_vm         \
rss nr_ptes swapents oom_score_adj name</div><div>Jun 29 13:11:01 ip-172-25-2-8 \
kernel: [93823.482698] [   163]       0    163      10419       1295         21       \
0                   0 systemd-journal</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: \
[93823.486646] [   165]       0    165      10202         136         21            0 \
-1000 systemd-udevd</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.490598] [  \
294]       0    294       6351       1729         14            0                   0 \
dhclient</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.494457] [   319]      \
0    319       6869          62         18            0                   0 \
cron</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.498260] [   321]       0  \
321       4964          67         14            0                   0 \
systemd-logind</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.502346] [   \
326]    105    326      10558         101         25            0               -900 \
dbus-daemon</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.506315] [   342]   \
0    342      65721         228         31            0                   0 \
rsyslogd</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.510222] [   343]      \
0    343      88199       2108         61            0               -500 \
dockerd</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.514022] [   350]    \
106    350      18280         181         36            0                   0 \
zabbix_agentd</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.518040] [   351] \
106    351      18280         475         36            0                   0 \
zabbix_agentd</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.522041] [   352] \
106    352      18280         187         36            0                   0 \
zabbix_agentd</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.526025] [   353] \
106    353      18280         187         36            0                   0 \
zabbix_agentd</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.530067] [   354] \
106    354      18280         187         36            0                   0 \
zabbix_agentd</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.534033] [   355] \
106    355      18280         190         36            0                   0 \
zabbix_agentd</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.538001] [   358] \
0    358      66390       1826         32            0                   0 \
fail2ban-server</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.541972] [   \
400]       0    400      35984         444         24            0               -500 \
docker-containe</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.545879] [   \
568]       0    568      13796         168         30            0             -1000 \
sshd</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.549733] [   576]       0  \
576       3604          41         12            0                   0 \
agetty</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.553569] [   577]       \
0    577       3559          38         12            0                   0 \
agetty</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.557322] [16201]       0 \
16201      29695      20707         60            0                   0 \
varnishd</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.561103] [16209]    \
108 16209 118909802    822425    29398            0                   0 \
cache-main</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.565002] [27352]     \
0 27352      20131         214         42            0                   0 \
sshd</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.568682] [27354]   1000 \
27354      20165         211         41            0                   0 \
sshd</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.572307] [27355]   1000 \
27355       5487         146         17            0                   0 \
bash</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.575920] [27360]       0 \
27360      11211         107         26            0                   0 \
sudo</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.579593] [27361]       0 \
27361      11584          97         27            0                   0 \
su</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.583155] [27362]       0 \
27362       5481         142         15            0                   0 \
bash</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.586782] [27749]       0 \
27749      20131         214         41            0                   0 \
sshd</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.590428] [27751]   1000 \
27751      20164         211         39            0                   0 \
sshd</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.593979] [27752]   1000 \
27752       5487         147         15            0                   0 \
bash</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.597488] [28762]       0 \
28762      26528         132         17            0                   0 \
varnishstat</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.601239] [28764]    \
0 28764      11211         106         26            0                   0 \
sudo</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.604737] [28765]       0 \
28765      11584          97         26            0                   0 \
su</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.608602] [28766]       0 \
28766       5481         141         15            0                   0 \
bash</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.612288] [28768]       0 \
28768      26528         220         18            0                   0 \
varnishstat</div><div>Jun 29 13:11:01 ip-172-25-2-8 kernel: [93823.616189] Out of \
memory: Kill process 16209 (cache-main) score 880 or sacrifice child</div><div>Jun 29 \
13:11:01 ip-172-25-2-8 kernel: [93823.620106] Killed process 16209 (cache-main) \
total-vm:475639208kB, anon-rss:3289700kB, file-rss:0kB</div><div>Jun 29 13:11:01 \
ip-172-25-2-8 varnishd[16201]: Child (16209) died signal=9</div><div>Jun 29 13:11:01 \
ip-172-25-2-8 varnishd[16201]: Child cleanup complete</div><div>Jun 29 13:11:01 \
ip-172-25-2-8 varnishd[16201]: Child (30313) Started</div><div>Jun 29 13:11:01 \
ip-172-25-2-8 varnishd[16201]: Child (30313) said Child starts</div><div>Jun 29 \
13:11:01 ip-172-25-2-8 varnishd[16201]: Child (30313) said SMF.s0 mmap&#39;ed \
483183820800 bytes of \
483183820800</div></div><div><br></div><div>Best,</div><div>Stefano</div><div><br></div></div><div \
class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div \
class="gmail_quote">On Wed, Jun 28, 2017 at 11:33 AM, Reza Naghibi <span \
dir="ltr">&lt;<a href="mailto:reza@varnish-software.com" \
target="_blank">reza@varnish-software.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div dir="ltr">Assuming the problem is running out of memory, \
you will need to do some memory tuning, especially given the number of threads you \
are using and your access patterns. Your options:<div><ul><li>Add more memory to the \
system</li><li>Reduce thread_pool_max</li><li>Reduce jemalloc&#39;s thread cache \
(MALLOC_CONF=&quot;lg_tcache_max:10<wbr>&quot;)</li><li>Use some of the tuning params \
in here:  <a href="https://info.varnish-software.com/blog/understanding-varnish-cache-memory-usage" \
target="_blank">https://info.varnish-sof<wbr>tware.com/blog/understanding-<wbr>varnish-cache-memory-usage</a></li></ul></div></div><div \
class="gmail_extra"><br clear="all"><div><div \
class="m_4767850188272771689m_6962546993069621369gmail_signature" \
data-smartmail="gmail_signature"><div dir="ltr">--<br>Reza Naghibi<br>Varnish \
Software</div></div></div> <br><div class="gmail_quote">On Wed, Jun 28, 2017 at 9:26 \
AM, Guillaume Quintard <span dir="ltr">&lt;<a \
href="mailto:guillaume@varnish-software.com" \
target="_blank">guillaume@varnish-software.co<wbr>m</a>&gt;</span> \
wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px \
<br><div class="gmail_quote">On Wed, Jun 28, 2017 at 3:20 PM, Stefano Baldo <span \
dir="ltr">&lt;<a href="mailto:stefanobaldo@gmail.com" \
target="_blank">stefanobaldo@gmail.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div dir="ltr">Hi Guillaume.<div><br></div><div>I increased \
the cli_timeout yesterday to 900sec (15min) and it restarted anyway, which seems to \
indicate that the thread is really stalled.</div><div><br></div><div>This was 1 \
minute after the last restart:</div><div><br></div><div><div>MAIN.n_object            \
3908216               .    object structs made</div><div>SMF.s0.g_alloc               \
7794510               .    Allocations \
outstanding</div></div><div><br></div><div><div>I&#39;ve just changed the I/O \
Scheduler to noop to see what happens.</div></div><div><br></div><div>One interest \
thing I&#39;ve found is about the memory usage.</div><div><br></div><div>In the 1st \
minute of use:</div><div><div>MemTotal:            3865572 kB</div><div>MemFree:      \
120768 kB</div><div>MemAvailable:      2300268 \
kB</div></div><span><div><br></div><div>1 minute before a \
restart:</div></span><div><div>MemTotal:            3865572 kB</div><div>MemFree:     \
82480 kB</div><div>MemAvailable:         68316 kB</div><div><br></div><div>It seems \
like the system is possibly running out of \
memory.</div></div><div><br></div><div>When calling varnishd, I&#39;m specifying only \
&quot;-s file,...&quot; as storage. I see in some examples that is common to use \
&quot;-s file&quot; AND &quot;-s malloc&quot; together. Should I be passing &quot;-s \
malloc&quot; as well to somehow try to limit the memory usage by \
varnishd?</div><div><br></div><div>Best,</div><div>Stefano</div><div><br></div></div><div \
class="m_4767850188272771689m_6962546993069621369m_-7059487644435461620HOEnZb"><div \
class="m_4767850188272771689m_6962546993069621369m_-7059487644435461620h5"><div \
class="gmail_extra"><br><div class="gmail_quote">On Wed, Jun 28, 2017 at 4:12 AM, \
Guillaume Quintard <span dir="ltr">&lt;<a \
href="mailto:guillaume@varnish-software.com" \
target="_blank">guillaume@varnish-software.co<wbr>m</a>&gt;</span> \
wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px \
#ccc solid;padding-left:1ex"><div dir="ltr">Sadly, nothing suspicious here, you can \
still try:<div>- bumping the cli_timeout</div><div>- changing your disk \
scheduler</div><div>- changing the advice option of the file \
storage</div><div><br></div><div>I&#39;m still convinced this is due to Varnish \
getting stuck waiting for the disk because of the file storage \
fragmentation.</div><div><br></div><div>Maybe you could look at  SMF.*.g_alloc and \
compare it to the number of objects. Ideally, we would have a 1:1 relation between \
objects and allocations. If that number drops prior to a restart, that would be a \
good clue.</div><div><br></div></div><div class="gmail_extra"><br \
clear="all"><div><div \
class="m_4767850188272771689m_6962546993069621369m_-7059487644435461620m_-907744611723116449m_-9144013546892627890gmail_signature" \
data-smartmail="gmail_signature"><div dir="ltr"><div>-- <br></div>Guillaume \
Quintard<br></div></div></div> <br><div class="gmail_quote">On Tue, Jun 27, 2017 at \
11:07 PM, Stefano Baldo <span dir="ltr">&lt;<a href="mailto:stefanobaldo@gmail.com" \
target="_blank">stefanobaldo@gmail.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div dir="ltr">Hi Guillaume.<div><br></div><div>It keeps \
restarting.</div><div>Would you mind taking a quick look in the following VCL file to \
check if you find anything suspicious?</div><div><br></div><div>Thank you very \
much.</div><div><br></div><div>Best,</div><div>Stefano</div><div><br></div><div><div>vcl \
4.0;</div><div><br></div><div>import std;</div><div><br></div><div>backend default \
{</div><div>   .host = &quot;sites-web-server-lb&quot;;</div><div>   .port = \
&quot;80&quot;;</div><div>}</div><div><br></div><div>include \
&quot;/etc/varnish/bad_bot_detectio<wbr>n.vcl&quot;;</div><div><br></div><div>sub \
vcl_recv {</div><div>   call bad_bot_detection;</div><div><br></div><div>   if \
(req.url == &quot;/nocache&quot; || req.url == &quot;/version&quot;) {</div><div>     \
return(pass);</div><div>   }</div><div><br></div><div>   unset \
req.http.Cookie;</div><span><div>   if (req.method == &quot;PURGE&quot;) \
{</div></span><div>      ban(&quot;obj.http.x-host == &quot; + req.http.host + &quot; \
&amp;&amp; obj.http.x-user-agent !~ Googlebot&quot;);</div><div>      \
return(synth(750));</div><div>   }</div><div><br></div><div>   set req.url = \
regsuball(req.url, &quot;(?&lt;!(http:|https))\/+&quot;, \
&quot;/&quot;);</div><div>}</div><div><br></div><div>sub vcl_synth {</div><div>   if \
(resp.status == 750) {</div><div>      set resp.status = 200;</div><div>      \
synthetic(&quot;PURGED =&gt; &quot; + req.url);</div><div>      \
return(deliver);</div><div>   } elsif (resp.status == 501) {</div><div>      set \
resp.status = 200;</div><div>      set resp.http.Content-Type = &quot;text/html; \
charset=utf-8&quot;;</div><div>      \
synthetic(std.fileread(&quot;/etc/v<wbr>arnish/pages/<a \
href="http://invalid_domain.ht" \
target="_blank">invalid_domain.ht</a><wbr>ml&quot;));</div><div>      \
return(deliver);</div><div>   }</div><div>}</div><div><br></div><div>sub \
vcl_backend_response {</div><div>   unset beresp.http.Set-Cookie;</div><div>   set \
beresp.http.x-host = bereq.http.host;</div><div>   set beresp.http.x-user-agent = \
bereq.http.user-agent;</div><div><br></div><div>   if (bereq.url == \
&quot;/themes/basic/assets/theme.mi<wbr>n.css&quot;</div><div>      || bereq.url == \
&quot;/api/events/PAGEVIEW&quot;</div><div>      || bereq.url ~ \
&quot;^\/assets\/img\/&quot;) {</div><div>      set beresp.http.Cache-Control = \
&quot;max-age=0&quot;;</div><div>   } else {</div><div>      unset \
beresp.http.Cache-Control;</div><div>   }</div><div><br></div><div>   if \
(beresp.status == 200 ||</div><div>      beresp.status == 301 ||</div><div>      \
beresp.status == 302 ||</div><div>      beresp.status == 404) {</div><div>         if \
(bereq.url ~ &quot;\&amp;ordenar=aleatorio$&quot;) {</div><div>            set \
beresp.http.X-TTL = &quot;1d&quot;;</div><div>            set beresp.ttl = \
1d;</div><div>         } else {</div><div>            set beresp.http.X-TTL = \
&quot;1w&quot;;</div><div>            set beresp.ttl = 1w;</div><div>         \
}</div><div>   }</div><div><br></div><div>   if (bereq.url !~ \
&quot;\.(jpeg|jpg|png|gif|gz|tgz|bz<wbr>2|tbz|mp3|ogg|swf|flv)$&quot;) {</div><div>   \
set beresp.do_gzip = true;</div><div>   }</div><div>}</div><div><br></div><div>sub \
vcl_pipe {</div><div>   set bereq.http.connection = &quot;close&quot;;</div><div>   \
return (pipe);</div><div>}</div><div><br></div><div>sub vcl_deliver {</div><div>   \
unset resp.http.x-host;</div><div>   unset \
resp.http.x-user-agent;</div><div>}</div><div><br></div><div>sub vcl_backend_error \
{</div><div>   if (beresp.status == 502 || beresp.status == 503 || beresp.status == \
504) {</div><div>      set beresp.status = 200;</div><div>      set \
beresp.http.Content-Type = &quot;text/html; charset=utf-8&quot;;</div><div>      \
synthetic(std.fileread(&quot;/etc/v<wbr>arnish/pages/maintenance.html&quot;<wbr>));</div><div> \
return (deliver);</div><div>   }</div><div>}</div><div><br></div><div>sub vcl_hash \
{</div><div>   if (req.http.User-Agent ~ &quot;Google Page Speed&quot;) {</div><div>  \
hash_data(&quot;Google Page Speed&quot;);</div><div>   } elsif (req.http.User-Agent ~ \
&quot;Googlebot&quot;) {</div><div>      hash_data(&quot;Googlebot&quot;);</div><div> \
}</div><div>}</div><div><br></div><div>sub vcl_deliver {</div><div>   if (resp.status \
== 501) {</div><div>      return (synth(resp.status));</div><div>   }</div><div>   if \
(obj.hits &gt; 0) {</div><div>      set resp.http.X-Cache = \
&quot;hit&quot;;</div><div>   } else {</div><div>      set resp.http.X-Cache = \
&quot;miss&quot;;</div><div>   }</div><div>}</div></div><div><br></div></div><div \
class="m_4767850188272771689m_6962546993069621369m_-7059487644435461620m_-907744611723116449m_-9144013546892627890HOEnZb"><div \
class="m_4767850188272771689m_6962546993069621369m_-7059487644435461620m_-907744611723116449m_-9144013546892627890h5"><div \
class="gmail_extra"><br><div class="gmail_quote">On Mon, Jun 26, 2017 at 3:47 PM, \
Guillaume Quintard <span dir="ltr">&lt;<a \
href="mailto:guillaume@varnish-software.com" \
target="_blank">guillaume@varnish-software.co<wbr>m</a>&gt;</span> \
wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px \
#ccc solid;padding-left:1ex"><div dir="auto">Nice! It may have been the cause, time \
will tell.can you report back in a few days to let us know?<div \
data-smartmail="gmail_signature">-- <br>Guillaume Quintard </div></div><div \
class="gmail_extra"><br><div class="gmail_quote">On Jun 26, 2017 20:21, &quot;Stefano \
Baldo&quot; &lt;<a href="mailto:stefanobaldo@gmail.com" \
target="_blank">stefanobaldo@gmail.com</a>&gt; wrote:<br \
type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 \
.8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hi \
Guillaume.<div><br></div><div>I think things will start to going better now after \
changing the bans.</div><div>This is how my last varnishstat looked like moments \
before a crash regarding the bans:</div><div><br></div><div><div>MAIN.bans            \
41336               .    Count of bans</div><div>MAIN.bans_completed                \
37967               .    Number of bans marked \
&#39;completed&#39;</div><div>MAIN.bans_obj                               0           \
.    Number of bans using obj.*</div><div>MAIN.bans_req                         41335 \
.    Number of bans using req.*</div><div><div>MAIN.bans_added                      \
41336             0.68 Bans added</div><div>MAIN.bans_deleted                         \
0             0.00 Bans deleted</div></div><div><br></div></div><div>And this is how \
it looks like now:</div><div><br></div><div><div>MAIN.bans                            \
2               .    Count of bans</div><div>MAIN.bans_completed                      \
1               .    Number of bans marked \
&#39;completed&#39;</div><div>MAIN.bans_obj                               2           \
.    Number of bans using obj.*</div><div>MAIN.bans_req                               \
0               .    Number of bans using req.*</div><div>MAIN.bans_added             \
2016             0.69 Bans added</div><div>MAIN.bans_deleted                     2014 \
0.69 Bans deleted</div></div><div><br></div><div>Before the changes, bans were never \
deleted!</div><div>Now the bans are added and quickly deleted after a minute or even \
a couple of seconds.</div><div><br></div><div>May this was the cause of the problem? \
It seems like varnish was having a large number of bans to manage and test \
against.</div><div>I will let it ride now. Let&#39;s see if the problem persists or \
it&#39;s gone! :-)</div><div><br></div><div>Best,</div><div>Stefano</div><div><br></div></div><div \
class="gmail_extra"><br><div class="gmail_quote">On Mon, Jun 26, 2017 at 3:10 PM, \
Guillaume Quintard <span dir="ltr">&lt;<a \
href="mailto:guillaume@varnish-software.com" \
target="_blank">guillaume@varnish-software.co<wbr>m</a>&gt;</span> \
wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px \
#ccc solid;padding-left:1ex"><div dir="ltr">Looking good!</div><div \
class="gmail_extra"><br clear="all"><div><div \
class="m_4767850188272771689m_6962546993069621369m_-7059487644435461620m_-907744611723 \
116449m_-9144013546892627890m_5597395131735215197m_-8136520161260504004m_3696085977752858327m_2656412179228630195gmail_signature" \
data-smartmail="gmail_signature"><div dir="ltr"><div>-- <br></div>Guillaume \
Quintard<br></div></div></div> <br><div class="gmail_quote">On Mon, Jun 26, 2017 at \
7:06 PM, Stefano Baldo <span dir="ltr">&lt;<a href="mailto:stefanobaldo@gmail.com" \
target="_blank">stefanobaldo@gmail.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div dir="ltr"><div>Hi \
Guillaume,</div><div><br></div><div>Can the following be considered &quot;ban lurker \
friendly&quot;?</div><div><br></div><div>sub vcl_backend_response {</div><div>   set \
beresp.http.x-url = bereq.http.host + bereq.url;</div><div>   set \
beresp.http.x-user-agent = \
bereq.http.user-agent;</div><div>}</div><div><br></div><div>sub vcl_recv {</div><div> \
if (req.method == &quot;PURGE&quot;) {</div><div>      ban(&quot;obj.http.x-url == \
&quot; + req.http.host + req.url + &quot; &amp;&amp; obj.http.x-user-agent !~ \
Googlebot&quot;);</div><div>      return(synth(750));</div><div>   \
}</div><div>}</div><div><br></div><div>sub vcl_deliver {</div><div>   unset \
resp.http.x-url;</div><div>   unset \
resp.http.x-user-agent;</div><div>}</div><div><br></div><div>Best,</div><div>Stefano</div><div><br></div></div><div \
class="m_4767850188272771689m_6962546993069621369m_-7059487644435461620m_-907744611723 \
116449m_-9144013546892627890m_5597395131735215197m_-8136520161260504004m_3696085977752858327m_2656412179228630195HOEnZb"><div \
class="m_4767850188272771689m_6962546993069621369m_-7059487644435461620m_-907744611723 \
116449m_-9144013546892627890m_5597395131735215197m_-8136520161260504004m_3696085977752858327m_2656412179228630195h5"><div \
class="gmail_extra"><br><div class="gmail_quote">On Mon, Jun 26, 2017 at 12:43 PM, \
Guillaume Quintard <span dir="ltr">&lt;<a \
href="mailto:guillaume@varnish-software.com" \
target="_blank">guillaume@varnish-software.co<wbr>m</a>&gt;</span> \
wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px \
#ccc solid;padding-left:1ex"><div dir="auto"><div>Not lurker friendly at all indeed. \
You&#39;ll need to avoid req.* expression. Easiest way is to stash the host, \
user-agent and url in beresp.http.* and ban against those (unset them in \
vcl_deliver).</div><div dir="auto"><br></div><div dir="auto">I don&#39;t think you \
need to expand the VSL at all.<br><br><div data-smartmail="gmail_signature" \
dir="auto">-- <br>Guillaume Quintard </div><div class="gmail_extra" \
dir="auto"><br><div class="gmail_quote">On Jun 26, 2017 16:51, &quot;Stefano \
Baldo&quot; &lt;<a href="mailto:stefanobaldo@gmail.com" \
target="_blank">stefanobaldo@gmail.com</a>&gt; wrote:<br \
type="attribution"><blockquote \
class="m_4767850188272771689m_6962546993069621369m_-7059487644435461620m_-907744611723 \
116449m_-9144013546892627890m_5597395131735215197m_-8136520161260504004m_3696085977752858327m_2656412179228630195m_1923865823290609677m_-368546667290509424quote" \
style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div \
dir="ltr">Hi  <span style="font-size:12.8px">Guillaume.</span><div><span \
style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">Thanks \
for answering.</span></div><div><br></div><div>I&#39;m using a SSD disk. I&#39;ve \
changed from ext4 to ext2 to increase performance but it stills \
restarting.<br></div><div>Also, I checked the I/O performance for the disk and there \
is no signal of overhead.</div><div><br></div><div>I&#39;ve changed the \
/var/lib/varnish to a tmpfs and increased its 80m default size passing &quot;-l \
200m,20m&quot; to varnishd and using &quot;nodev,nosuid,noatime,size=256<wbr>M 0 \
0&quot; for the tmpfs mount. There was a problem here. After a couple of hours \
<br><div class="gmail_quote">On Fri, Jun 23, 2017 at 4:01 PM, Stefano Baldo <span \
dir="ltr">&lt;<a href="mailto:stefanobaldo@gmail.com" \
target="_blank">stefanobaldo@gmail.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div dir="ltr">Hello.<div><br></div><div>I am having a \
critical problem with Varnish Cache in production for over a month and any help will \
be appreciated.</div><div>The problem is that  Varnish child process is recurrently \
being restarted after 10~20h of use, with the following \
message:</div><div><br></div><div><div><div>Jun 23 09:15:13 b858e4a8bd72 \
varnishd[11816]: Child (11824) not responding to CLI, killed it.</div><div>Jun 23 \
09:15:13 b858e4a8bd72 varnishd[11816]: Unexpected reply from ping: 400 CLI \
communication error</div><div>Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child \
(11824) died signal=9</div><div>Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child \
cleanup complete</div><div>Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child \
(24038) Started</div><div>Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) \
said Child starts</div><div>Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child \
(24038) said SMF.s0 mmap&#39;ed 483183820800 bytes of \
483183820800</div></div><div><br></div><div>The following link is the varnishstat \
output just 1 minute before a restart:</div><div><br></div><div><a \
href="https://pastebin.com/g0g5RVTs" \
target="_blank">https://pastebin.com/g0g5RVTs</a></div><div><br></div><div>Environment:</div><div><br></div><div>varnish-5.1.2 \
revision 6ece695</div><div>Debian 8.7 - Debian GNU/Linux 8 \
(3.16.0)</div><div>Installed using pre-built package from official repo at <a \
href="http://packagecloud.io" target="_blank">packagecloud.io</a></div><div>CPU 2x2.9 \
GHz</div><div>Mem 3.69 GiB</div><div>Running inside a Docker \
container</div><div>NFILES=131072</div><div>MEMLOCK=82000</div><div><br></div><div>Additional \
info:</div><div><br></div><div>- I need to cache a large number of objets and the \
cache should last for almost a week, so I have set up a 450G storage space, I \
don&#39;t know if this is a problem;</div><div>- I use ban a lot. There was about 40k \
bans in the system just before the last crash. I really don&#39;t know if this is too \
much or may have anything to do with it;</div><div>- No registered CPU spikes (almost \
always by 30%);</div><div>- No panic is reported, the only info I can retrieve is \
from syslog;</div><div>- During all the time, event moments before the crashes, \
everything is okay and requests are being responded very \
fast.</div></div><div><br></div><div>Best,</div><div>Stefano \
Baldo</div><div><br></div></div> \
<br>______________________________<wbr>_________________<br> varnish-misc mailing \
list<br> <a href="mailto:varnish-misc@varnish-cache.org" \
target="_blank">varnish-misc@varnish-cache.org</a><br> <a \
href="https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc" \
rel="noreferrer" target="_blank">https://www.varnish-cache.org/<wbr>lists/mailman/listinfo/varnish<wbr>-misc</a><br></blockquote></div><br></div>
 </blockquote></div><br></div>
</div></blockquote></div><br></div></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</blockquote></div><br></div>
</blockquote></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div></div></div>
<br>______________________________<wbr>_________________<br>
varnish-misc mailing list<br>
<a href="mailto:varnish-misc@varnish-cache.org" \
target="_blank">varnish-misc@varnish-cache.org</a><br> <a \
href="https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc" \
rel="noreferrer" target="_blank">https://www.varnish-cache.org/<wbr>lists/mailman/listinfo/varnish<wbr>-misc</a><br></blockquote></div><br></div>
 </blockquote></div><br></div>
</div></div></blockquote></div><br></div>



_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic