[prev in list] [next in list] [prev in thread] [next in thread] 

List:       wireshark-dev
Subject:    Re: [Wireshark-dev] Regenerating packet-parlay.c
From:       Jaap Keuter <jaap.keuter () xs4all ! nl>
Date:       2020-05-04 12:46:58
Message-ID: 14cbd101-1943-0b74-c18e-451c35396f67 () xs4all ! nl
[Download RAW message or body]

On 5/4/20 11:16 AM, Luke Mewburn wrote:
> On 20-05-04 10:55, Jaap Keuter wrote:
> > Hi Luke,
> > 
> > > Yes, I regenerated the code using that patch to
> > > tools/wireshark_gen.py, and it builds fine across a couple of
> > > platforms.
> > 
> > Yes, I could build it too, so the generated code itself was okay. My
> > quest is to regenerate the _same_ source files.
> 
> I don't think that's going to be possible to regenerate the same as the
> currently committed file, because it was generated with wireshark_gen.py
> whose implementation that did not guarantee reproducible output.
> 
> I think that unless you have access to the exact system setup that
> was used to generate packet-parlay.c that you won't be able reproduce
> the files as-is with the existing tool anyway.
> 
Good point. Also this drives home the point that generated code needs to be
reproducible.


> 
> > > I wrote a simple wrapper tool idl-regen (attached) to assist each
> > > regen from the idl, used with something like:
> > > 	$ ./idl-regen ../epan/dissectors/
> > > 	mmccs.idl:120: Warning: Identifier 'EventType' clashes with CORBA 3 keyword \
> > > 'eventtype'
> > > 	policy_data.idl:119: Warning: Anonymous sequences for recursive unions are \
> > > deprecated. Use a forward declaration instead.
> > > 	policy_data.idl:123: Warning: Anonymous sequences for recursive unions are \
> > > deprecated. Use a forward declaration instead.
> > > 
> > > I get the same .c files generated from either:
> > > - CentOS 7,  python 2.7.5, omniidl 4.2.0
> > > - Fedora 31, python 3.7.6, omniidl 4.2.3
> > > 
> > 
> > So the code generation is identical between Python 2 and 3, that's a
> > good sign.
> 
> Yes.
> 
> 
> > > Three files get changes from git master: - packet-gias.c -
> > > packet-parlay.c - packet-tango.c
> > > 
> > > The generated code for packet-gias.c and packet-tango.c differ to
> > > git master because of the following fix to idl2wrs that doesn't
> > > appear to have been rerun to regenerate those files: commit
> > > 443df93896 Author: Yannik Enss <Yannik.Enss@rohde-schwarz.com>
> > > Date:   2019-05-29 14:20:42 +0200
> > > 
> > > 	idl2wrs: fix 'undeclared identifier' error
> > > 
> > > 	the 'x_octetx' variables were removed a few years back, replace
> > > 	them with get_CDR_xxx()
> > > 	I8cf3410d8a152c834e7019f7d1d80de3798530c3
> > > 
> > 
> > Good find. I've applied the anti-patch for this and reran code
> > generation. Now the same source files come out (with the exception
> > below). I've also build with these files without issue, so can
> > anyone tell me what this commit achieves? It references something 'a
> > few years back' which doesn't help either. If nothing I'm inclined
> > to apply this anti-patch.
> > 
> > Gerald, you seem to have 'rubber stamped' this patch series about a
> > year ago. I know it's much to ask, but I can't find anything on
> > this, no bug report, no wireshark-dev posts, can you tell what this
> > was for and why the repository code is not the same as the code
> > generated?
> 
> I actually think the change is probably fine to leave, but I'll defer
> to others.
> 
True, but than it becomes a solution searching for a problem. I'd rather
understand the problem, or find there's no problem at all, so the solution can
be dropped.


> 
> > > packet-parlay.c gets that above change as well as many other
> > > changes to generated code for /* User exception filters */,
> > > mostly in the ordering of the declarations and the list of
> > > array entries added to proto_register_giop_parlay() variable
> > > 	static hf_register_info hf[] 
> > > 
> > 
> > Yes, this was the trigger for my initial question. Why is that ordering
> > different than what's committed in the repo. My position is that
> > generated files must always be possible to be regenerated.
> 
> I completely agree that generated files must be able to be reliably
> regenerated from the same inputs and tools without change.
> 
> As we both know, the issue is that the existing tool did not guarantee
> reproducible output, due to the use of a python dict which did not
> guarantee a particular ordering before python 3.6.
> 
> 
> 
> > > As far as I can tell, the new packet-parlay.c has the same
> > > number of lines, just in different order.
> > 
> > That was my impression too, but there are just too many to check by hand.
> > 
> > 
> > > and compared the regenerated file with that, they both still have same
> > > size and number of lines, and if I sort the lines in both the files are
> > > identical. (Of course, that doesn't compile... :)
> > > 
> > 
> > Oh, that is indeed a nice trick to see if at least the same lines are in the
> > generated code files. I'm going to use that.
> 
> It wasn't super robust, but it was just part of a broader group of
> tests I ran.
> 
Of course, it's partial, but significant.

> 
> > > I tried doing some comparison of the generated object files but
> > > they differed too much, probably because of the reordering of the
> > > source lines, even though the hf[] array ends up with the same
> > > content in different order.
> > > 
> > 
> > Okay, so your statement is that we just have to accept that the
> > ordering of the the currently generated source file is going to be
> > different from the committed source file. Do you think we still
> > should introduce the ordering statements?
> 
> I think we should use the OrderedDict change I proposed, which means
> that the ordering will be in order the data structure is populated.
> This matches all of the other data structures in wireshark_gen.py
> and wireshark_be.py (lists).
> 
> I don't think we need to change the code to sort all the different
> identifiers and generated code, however. That will result in a bigger
> change to all the source files generated from IDL.
> 
Agreed, so your improvements go in, as far as I'm concerned.


> 
> > > I don't know how much more verification is required, however.  It
> > > looks like in the past there have been changes to wireshark_gen.py
> > > and then regen of packet-parlay.c that reordered a lot of the
> > > content.  For example, see commit c99bee9b5d on 2019-06-05.
> > > 
> > 
> > Well, for me now reproducibility is the main goal. With your 'line
> > sorting trick' we can at least confidently state that the actual
> > items are not changed, and have to assume that the code generation
> > is solid enough to create equivalent code. I'm okay with that.
> 
> There may be some more testing that could be done with processing the
> output of objdump, but I'm not sure it's worth it.
> 
> As mentioned previously, I suspect that this level of verification
> of regenerated files has not been applied to the previously
> committed versions of these regenerated files based on reviewing
> their commit history.
> 
> 
> Another other test, of course, is to process some parlay captures with
> tshark with a full decode, and compare the output :)
> 
> 
Sure thing. I'll see what I can do.

thanks,
Jaap

___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <wireshark-dev@wireshark.org>
Archives:    https://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://www.wireshark.org/mailman/options/wireshark-dev
             mailto:wireshark-dev-request@wireshark.org?subject=unsubscribe


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic