[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-serviceability-dev
Subject:    Re: [PATCH] attach in linux should be relative to /proc/pid/root and namespace aware
From:       TJ Fontaine <tj.fontaine () oracle ! com>
Date:       2017-05-03 15:41:26
Message-ID: 20170503154126.GB61603 () tjfontai-mac
[Download RAW message or body]

On Tue, May 02, 2017 at 02:16:09PM +1000, David Holmes wrote:
> Hi TJ,
> 
> On 1/05/2017 10:22 AM, TJ Fontaine wrote:
> > Hey,
> > 
> > I've attached a version rebased on jdk10, it also (currently) applies cleanly to \
> > jdk9.
> 
> I have filed:
> 
> https://bugs.openjdk.java.net/browse/JDK-8179498
> 
> and have hosted your patch at:
> 
> http://cr.openjdk.java.net/~dholmes/8179498/webrev/
> 
> I find it a little confusing understanding when we need to use pid and when
> we need to use ns_pid. IIUC the client VM doing the attach should use the
> "global" pid in the /proc path but name the file with the ns_pid as seen by
> the target VM - correct? And the query "what is my pid" will actually return
> the ns_pid - correct?

Your understanding is correct, and the confusion is natural. First because
there's not simply not a lot of useful vocabulary for Linux containers
(yet), and second its coupled with just how flexible (i.e. complex) the
environments can be.

I've been calling this the "Container Identity Crisis", there are at least
two distinct identities in play. What the client initiating the attach
believes the identity/pid of the target process is (as it relates to what it
can find in its /proc), and what the target process believes its own
identity to be.

For development tools to be container aware, the calling process should
always address the target process relative to what is reachable through its
/proc (e.g. cwd, root), however if you need to interact with the target
process you may need to be aware of the inner most namespaced pid -- as is
the case with the JVM attach sequence.

> 
> 345         // TODO XXX friggin old kernels may not have NSpid field (i.e.
> 3.10)
> 346         // fallback to original pid in the event we cannot deduce
> 
> > ) Suggest:
> 
> // If no NSpid field found assume no namespace and fallback to original pid

ACK

> 
> Overall the approach seems reasonable, but I can't validate the operational
> details. Lets see what others have to say.
> 
> > While there is no supplied test or harness for this patch, how I built and tested \
> > is available at https://github.com/tjfontaine/jdkbuild (there's also a preview of \
> > my follow on patch for pathmap_open as well).
> 
> The lack of testability is somewhat of a concern. We can of course test that
> it works okay when the namespace is not present, but can't verify it works
> as you intend.
> 
> And of course this doesn't help with Docker containers running under earlier
> versions of the Linux kernel.

Correct, sadly the tools will continue to be broken (for now) if we cannot
deduce the namespaced pid of the target process.

Let me investigate the test infrastructure more to see what can be done with
regards to being able to do more validation.

> 
> Thanks,
> David
> 
> > Thanks!
> > 
> > TJ
> > 
> > On 4/28/17, 3:47 PM, "serviceability-dev on behalf of TJ Fontaine" \
> > <serviceability-dev-bounces@openjdk.java.net on behalf of tj.fontaine@oracle.com> \
> > wrote: 
> > I had no doubt we'd end up on the conversation of 10 -> 9 -> 8u, I started with \
> > 8u merely because it was representative of today's customer pain. I'll be sure to \
> > work on retargeting it as well. 
> > Thanks!
> > 
> > TJ
> > 
> > On 4/28/17, 3:42 PM, "David Holmes" <david.holmes@oracle.com> wrote:
> > 
> > Hi TJ,
> > 
> > Thanks for the patch (I haven't looked at it yet). FYI at the moment,
> > unless this is considered a high priority bug for JDK 9 it has to be
> > targeted to JDK 10, and then possibly backported to 9 and 8u.
> > 
> > Cheers,
> > David
> > 
> > On 29/04/2017 8:23 AM, TJ Fontaine wrote:
> > > I have attached a patch that allows jcmd to work against a java process running
> > > inside a Docker container. Apologies if this is not in the correct format. It \
> > > was built against jdk8u. I couldn't seem to find an existing JIRA for it.
> > > 
> > > Diagnostic commands (i.e. jcmd, jstack, etc) fail to attach to a target JVM
> > > that is inside a container (e.g. Docker).
> > > 
> > > A Linux container often isolates a process in a PID and Mount namespace that is
> > > separate from the "root container" (analogous to the hypervisor/dom0 in
> > > hardware virtualization environments, or the global zone on Solaris). A target
> > > JVM that is isolated in either a PID namespace, or a Mount namespace will fail
> > > the attach sequence.
> > > 
> > > When the target JVM is in its own PID namespace the pid of the process is
> > > distinct from what the real pid of the process as it relates to the root
> > > container. For example, in the root container you can observe a JVM with a pid
> > > of 17734, however if that JVM is running inside a Docker container the pid
> > > inside its PID namespace is likely 1. So when the target JVM receives the
> > > SIGQUIT it looks in /proc/self/cwd/ for .attach_pid1 however the external
> > > attaching JVM has created the file /proc/17734/cwd/.attach_pid17734. Given this
> > > discrepancy the target JVM will output to stderr thread status, since
> > > /proc/self/cwd/.attach_pid1 doesn't exist and won't continue with the attach
> > > sequence.
> > > 
> > > The solution is to parse /proc/pid/status for the field NSpid (available since
> > > Linux 4.1) which contains a list of pids, where the last entry is the "inner
> > > most" PID namespace value. (Namespaces can be stacked, unlike Solaris Zones
> > > which have a virtualization depth of 1)
> > > 
> > > The rest of the Linux attach sequence assumes a shared mount namespace by
> > > waiting for /tmp/.java_pid17734 to appear. But if the attaching process is in a
> > > separate namespace because the target JVM is in a mount namepsace (or in a
> > > chroot as well) the unix domain socket for attaching won't appear.
> > > 
> > > Instead the attach sequence should resolve file names relative to
> > > /proc/17734/root which has a materialized view of the rootfs for the target.
> > > 
> > > Thanks!
> > > 
> > > TJ
> > > 
> > 
> > 
> > 
> > 
> > 


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic