'Re: [sleuthkit-developers] NTFS data run collisions'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       sleuthkit-developers
Subject:    Re: [sleuthkit-developers] NTFS data run collisions
From:       "Hu, Hongyi - 0559 - MITLL" <Hongyi.Hu () ll ! mit ! edu>
Date:       2014-04-04 18:24:23
Message-ID: CF6470F6.24F1%Hongyi.Hu () ll ! mit ! edu
[Download RAW message or body]

[Attachment #2 (multipart/mixed)]

[Attachment #4 (multipart/alternative)]


Sorry, forgot to add the attachments.
-- 
Hongyi Hu

MIT Lincoln Laboratory
Group 59 (Cyber System Assessments)
Ph: (781) 981-8224

From:  <Hu>, Hongyi Hu <Hongyi.Hu@ll.mit.edu>
Date:  Friday, April 4, 2014 2:22 PM
To:  Alex Nelson <ajnelson@cs.ucsc.edu>
Cc:  "sleuthkit-developers@lists.sourceforge.net"
<sleuthkit-developers@lists.sourceforge.net>
Subject:  Re: [sleuthkit-developers] NTFS data run collisions

Hi Alex,

Thanks for response.  I wasn't able to come back to this issue until this
week ‹ I found a bunch of bugs in analyzeMFT that was throwing off the
calculations.

It looks like the overlaps were due to my misunderstanding of how sparse and
compressed data runs work in NTFS, so at least for TSK it looks like there
aren't collisions between different MFT entry numbers.


A follow-up question about data runs that is highly perplexing.  I've
attached an odd example of a raw MFT entry (of a zip file) from my clean
disk image.  I also included the hex dump which includes my math and notes.
I'm perplexed as to how TSK is parsing the data runs.

The data run snippet is :

31 01 4c 6c 05
21 03 71 01
31 16 be 31 fd 
03 00 94 15 
01 31 
6f 9a 7c ff 31 27 04 bc 0d 31 4f 71 44 01 00  f5 80 00 00 00 00 80 00
00(End)

But TSK is interpreting the data runs as

31 01 4c 6c 05
21 03 71 01
31 16 be 31 fd 
03 00 94 15 01 
31 6f 9a 7c ff 
31 27 04 bc 0d 
31 4f 71 44 01 
00 (End)

TSK seems to be right, but I don't understand what it's doing.

My analysis by hand (which is the same as what analyzeMFT gives me and
consistent with all the NTFS documentation I could find) gives me the
following runs.  The first three are normal ‹ I get the same result as TSK.
The last few are divergent.

31 01 4c 6c 05 (normal)
len 0x01   offset 0x056c4c ==355404Cluster Address == 355404

21 03 71 01 (normal)
len 0x03   offset 0x0171 == 369Cluster Address == 355404  + 369 == 355773

31 16 be 31 fd (normal)
len 0x16 (22)   offset 0xfd31be == -183874Cluster Address == 171899


Here's where I'm confused:

03 00 94 15 (sparse)
The header gives me a 0 byte offset field and a 3 byte length field.
0 byte offset field means a sparse data run (so these runs don't take up
disk space and return 0s when read)
3 byte length field gives me a length of 0x159400 == 1414144

01 31 (sparse)
0 byte offset field
1 byte length field == length 0x31

6f 9a 7c ff 31 27 04  bc 0d 31 4f 71 44 01 00 f5 80 00 00 00 00 80 00
Something is clearly wrong here.



TSK gives me something more reasonable:

[Len: 1, Addr: 355404],
[Len: 3, Addr: 355773],
[Len: 22, Addr: 171899],
[Len: 39, Addr: 242959],
[Len: 111, Addr: 209321],
[Len: 39, Addr: 1109421],
[Len: 79, Addr: 1192478],

The first three runs are the same, but the rest are different.  TSK seems to
interpret the runs like this:

31 01 4c 6c 05
21 03 71 01
31 16 be 31 fd 
03 00 94 15 01 
31 6f 9a 7c ff 
31 27 04 bc 0d 
31 4f 71 44 01 
00 (End)


This only makes sense to me if the fourth line were 31 27 94 15 01 instead
of 03 00 94 15 01.  Then TSK's numbers and parsing check out with the raw
run list.  I believe that TSK is correct, but I don't understand how it is
parsing the data runs here.

Any ideas?

Thanks!

-- 
Hongyi Hu

MIT Lincoln Laboratory
Group 59 (Cyber System Assessments)
Ph: (781) 981-8224

From: Alex Nelson <ajnelson@cs.ucsc.edu>
Date: Wednesday, March 26, 2014 10:52 AM
To: Hongyi Hu <Hongyi.Hu@ll.mit.edu>
Cc: "sleuthkit-developers@lists.sourceforge.net"
<sleuthkit-developers@lists.sourceforge.net>
Subject: Re: [sleuthkit-developers] NTFS data run collisions

Hi Hongyi, 

For clarification, these are allocated files you're asking about, right?  If
some of the files are deleted, the answer is pretty straightforward.

Also, are you asking about partial or total overlaps?  You should be
building your hash table based on MFT entry numbers, not on file names.
NTFS allows multiple hard links.

Do you have example files you could reference in one of the publicly
available disk images?  (One of the M57's will likely give you an example.)
http://www.forensicswiki.org/wiki/Forensic_corpora#Disk_Images

--Alex


On Mar 25, 2014, at 14:00 , Hu, Hongyi - 0559 - MITLL <Hongyi.Hu@ll.mit.edu>
wrote:

> Hi,
> 
> I'm an NTFS rookie with a question about data runs.  Are there any normal
> reasons why two different files might have overlapping data runs, i.e. mapped
> to some of the same clusters/blocks on the disk?
> 
> For a research project, I would like to do the following: given a sector on
> the disk, determine what file (if any) owns the data in that sector.  The
> first thing I tried was to build a simple block to filename hash table.  For
> each file, I look at its data runs and put them into the table.  With both TSK
> and the analyzeMFT library and using a clean Windows XP disk image, I get a
> non-trivial number of block collisions.
> 
> Is this normal behavior?  I would have thought that the block assignments
> would be unique.  I have not been successful finding any info about this in
> various documentation.
> 
> 
> Thanks!
> 
> -- 
> Hongyi Hu
> 
> MIT Lincoln Laboratory
> Group 59 (Cyber System Assessments)
> Ph: (781) 981-8224
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/13534_NeoTech_____________________________________________
> __
> sleuthkit-developers mailing list
> sleuthkit-developers@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/sleuthkit-developers




[Attachment #7 (text/html)]

<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; \
-webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; \
font-family: Calibri, sans-serif; "><div><div><div>Sorry, forgot to add the \
attachments.</div><div><div><div>--&nbsp;</div><div>Hongyi \
Hu</div></div><div><br></div><div>MIT Lincoln Laboratory</div><div>Group 59 (Cyber \
System Assessments)</div><div>Ph: (781) \
981-8224</div></div></div></div><div><br></div><span id="OLK_SRC_BODY_SECTION"><div \
style="font-family:Calibri; font-size:11pt; text-align:left; color:black; \
BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; \
PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: \
medium none; PADDING-TOP: 3pt"><span style="font-weight:bold">From: </span> \
&lt;Hu&gt;, Hongyi Hu &lt;<a \
href="mailto:Hongyi.Hu@ll.mit.edu">Hongyi.Hu@ll.mit.edu</a>&gt;<br><span \
style="font-weight:bold">Date: </span> Friday, April 4, 2014 2:22 PM<br><span \
style="font-weight:bold">To: </span> Alex Nelson &lt;<a \
href="mailto:ajnelson@cs.ucsc.edu">ajnelson@cs.ucsc.edu</a>&gt;<br><span \
style="font-weight:bold">Cc: </span> "<a \
href="mailto:sleuthkit-developers@lists.sourceforge.net">sleuthkit-developers@lists.sourceforge.net</a>" \
&lt;<a href="mailto:sleuthkit-developers@lists.sourceforge.net">sleuthkit-developers@lists.sourceforge.net</a>&gt;<br><span \
style="font-weight:bold">Subject: </span> Re: [sleuthkit-developers] NTFS data run \
collisions<br></div><div><br></div><div><meta http-equiv="Content-Type" \
content="text/html; charset=utf-8"><div style="word-wrap: break-word; \
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); \
font-size: 14px; font-family: Calibri, sans-serif; "><div><div><div>Hi \
Alex,</div><div><br></div><div>Thanks for response. &nbsp;I wasn't able to come back \
to this issue until this week &#8212; I found a bunch of bugs in analyzeMFT that was \
throwing off the calculations.</div><div><br></div><div>It looks like the overlaps \
were due to my misunderstanding of how sparse and compressed data runs work in NTFS, \
so at least for TSK it looks like there aren't collisions between different MFT entry \
numbers.</div><div><br></div><div><br></div><div>A follow-up question about data runs \
that is highly perplexing. &nbsp;I've attached an odd example of a raw MFT entry (of \
a zip file) from my clean disk image. &nbsp;I also included the hex dump which \
includes my math and notes. &nbsp;I'm perplexed as to how TSK is parsing  the data \
runs.</div><div><br></div><div>The data run snippet is \
:</div><div><br></div><div><div>31 01 4c 6c 05</div><div>21 03 71 01</div><div>31 16 \
be 31 fd&nbsp;</div><div>03 00 94 15&nbsp;</div><div>01 31&nbsp;</div><div>6f 9a 7c \
ff 31 27 04 bc 0d 31 4f 71&nbsp;44 01&nbsp;00&nbsp;&nbsp;f5 80 00 00 00 00 \
80&nbsp;00&nbsp;</div><div>00(End)</div></div><div><br></div><div>But TSK is \
interpreting the data runs as</div><div><br></div><div><div>31 01 4c 6c \
05</div><div>21 03 71 01</div><div>31 16 be 31 fd&nbsp;</div><div>03 00 94 15 \
01&nbsp;</div><div>31 6f 9a 7c ff&nbsp;</div><div>31 27 04 bc 0d&nbsp;</div><div>31 \
4f 71&nbsp;44 01&nbsp;</div><div>00 (End)</div></div><div><br></div><div>TSK seems to \
be right, but I don't understand what it's doing.</div><div><br></div><div>My \
analysis by hand (which is the same as what analyzeMFT gives me and consistent with \
all the NTFS documentation I could find) gives me the following runs. &nbsp;The first \
three are normal &#8212; I get the same result as TSK. &nbsp;The last few are \
divergent.</div><div><br></div><div><div>31 01 4c 6c 05 (normal)</div><div>len 0x01 \
<span class="Apple-tab-span" style="white-space:pre"></span>&nbsp; offset 0x056c4c \
==355404<span class="Apple-tab-span" style="white-space:pre"></span>Cluster \
Address&nbsp;==&nbsp;355404</div><div><br></div><div>21 03 71 01 \
(normal)</div><div>len 0x03 <span class="Apple-tab-span" \
style="white-space:pre"></span>&nbsp; offset 0x0171 == 369<span \
class="Apple-tab-span" style="white-space:pre"></span>Cluster Address&nbsp;== 355404 \
&nbsp;+&nbsp;369 == 355773</div><div><br></div><div>31 16 be 31 fd \
(normal)</div><div>len 0x16 (22)<span class="Apple-tab-span" style="white-space:pre"> \
</span>&nbsp; offset 0xfd31be ==&nbsp;-183874<span class="Apple-tab-span" \
style="white-space:pre"></span>Cluster Address == \
171899</div><div><br></div><div><br></div><div>Here's where I'm \
confused:</div><div><br></div><div>03 00 94 15 (sparse)</div><div>The header gives me \
a 0 byte offset field and a 3 byte length field.</div><div>0 byte offset field means \
a sparse data run (so these runs don't take up disk space and return 0s when \
read)</div><div>3 byte length field gives me a length of 0x159400 == \
1414144</div><div><br></div><div>01 31 (sparse)</div><div>0 byte offset \
field</div><div>1 byte length field == length 0x31</div><div><br></div><div>6f 9a 7c \
ff 31 27 04 &nbsp;bc 0d 31 4f 71 44 01 00 f5 80 00 00 00 00 80 \
00</div></div><div>Something is clearly wrong \
here.&nbsp;</div><div><br></div><div><br></div><div><br></div><div>TSK gives me \
something more reasonable:</div><div><br></div><div><div>[Len: 1, Addr: \
355404],&nbsp;</div><div>[Len: 3, Addr: 355773],&nbsp;</div><div>[Len: 22, Addr: \
171899],&nbsp;</div><div>[Len: 39, Addr: 242959],&nbsp;</div><div>[Len: 111, Addr: \
209321],&nbsp;</div><div>[Len: 39, Addr: 1109421],&nbsp;</div><div>[Len: 79, Addr: \
1192478],</div></div><div><br></div><div>The first three runs are the same, but the \
rest are different. &nbsp;TSK seems to interpret the runs like \
this:</div><div><br></div><div><div>31 01 4c 6c 05</div><div>21 03 71 01</div><div>31 \
16 be 31 fd&nbsp;</div><div>03 00 94 15 01&nbsp;</div><div>31 6f 9a 7c \
ff&nbsp;</div><div>31 27 04 bc 0d&nbsp;</div><div>31 4f 71&nbsp;44 \
01&nbsp;</div><div>00 (End)</div></div><div><br></div><div><br></div><div>This only \
makes sense to me if the fourth line were 31 27 94 15 01 instead of 03 00 94 15 01. \
&nbsp;Then TSK's numbers and parsing check out with the raw run list. &nbsp;I believe \
that TSK is correct, but I don't understand how it is parsing the data runs \
here.</div><div><br></div><div>Any \
ideas?</div><div><br></div><div>Thanks!</div><div><br></div><div><div><div>--&nbsp;</div><div>Hongyi \
Hu</div></div><div><br></div><div>MIT Lincoln Laboratory</div><div>Group 59 (Cyber \
System Assessments)</div><div>Ph: (781) \
981-8224</div></div></div></div><div><br></div><span id="OLK_SRC_BODY_SECTION"><div \
style="font-family:Calibri; font-size:11pt; text-align:left; color:black; \
BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; \
PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: \
medium none; PADDING-TOP: 3pt"><span style="font-weight:bold">From: </span>Alex \
Nelson &lt;<a href="mailto:ajnelson@cs.ucsc.edu">ajnelson@cs.ucsc.edu</a>&gt;<br><span \
style="font-weight:bold">Date: </span>Wednesday, March 26, 2014 10:52 AM<br><span \
style="font-weight:bold">To: </span>Hongyi Hu &lt;<a \
href="mailto:Hongyi.Hu@ll.mit.edu">Hongyi.Hu@ll.mit.edu</a>&gt;<br><span \
style="font-weight:bold">Cc: </span>"<a \
href="mailto:sleuthkit-developers@lists.sourceforge.net">sleuthkit-developers@lists.sourceforge.net</a>" \
&lt;<a href="mailto:sleuthkit-developers@lists.sourceforge.net">sleuthkit-developers@lists.sourceforge.net</a>&gt;<br><span \
style="font-weight:bold">Subject: </span>Re: [sleuthkit-developers] NTFS data run \
collisions<br></div><div><br></div><div><div style="word-wrap: break-word; \
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"> Hi Hongyi,
<div><br></div><div>For clarification, these are allocated files you're asking about, \
right? &nbsp;If some of the files are deleted, the answer is pretty \
straightforward.</div><div><br></div><div>Also, are you asking about partial or total \
overlaps? &nbsp;You should be building your hash table based on MFT entry numbers, \
not on file names. &nbsp;NTFS allows multiple hard links.</div><div><br></div><div>Do \
you have example files you could reference in one of the publicly available disk \
images? &nbsp;(One of the M57's will likely give you an example.)</div><div><a \
href="http://www.forensicswiki.org/wiki/Forensic_corpora#Disk_Images">http://www.foren \
sicswiki.org/wiki/Forensic_corpora#Disk_Images</a></div><div><br></div><div>--Alex</div><div><br><br><div><div>On \
Mar 25, 2014, at 14:00 , Hu, Hongyi - 0559 - MITLL &lt;<a \
href="mailto:Hongyi.Hu@ll.mit.edu">Hongyi.Hu@ll.mit.edu</a>&gt; wrote:</div><br \
class="Apple-interchange-newline"><blockquote type="cite"><div style="word-wrap: \
break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; \
font-size: 14px; font-family: Calibri, \
sans-serif;"><div>Hi,</div><div><br></div><div>I'm an NTFS rookie with a question \
about data runs. &nbsp;Are there any normal reasons why two different files might \
have overlapping data runs, i.e. mapped to some of the same clusters/blocks on the \
disk?</div><div><br></div><div>For a research project, I would like to do the \
following: given a sector on the disk, determine what file (if any) owns the data in \
that sector. &nbsp;The first thing I tried was to build a simple block to filename \
hash table. &nbsp;For each file, I look at its  data runs and put them into the \
table. &nbsp;With both TSK and the analyzeMFT library and using a clean Windows XP \
disk image, I get a non-trivial number of block \
collisions.</div><div><br></div><div>Is this normal behavior? &nbsp;I would have \
thought that the block assignments would be unique. &nbsp;I have not been successful \
finding any info about this in various \
documentation.</div><div><br></div><div><br></div><div>Thanks!</div><div><br></div><div><div><div>--&nbsp;</div><div>Hongyi \
Hu</div></div><div><br></div><div>MIT Lincoln Laboratory</div><div>Group 59 (Cyber \
                System Assessments)</div><div>Ph: (781) 981-8224</div></div></div>
------------------------------------------------------------------------------<br>
Learn Graph Databases - Download FREE O'Reilly Book<br>
"Graph Databases" is the definitive new guide to graph databases and their<br>
applications. Written by three acclaimed leaders in the field,<br>
this first edition is now available. Download your free book today!<br><a \
href="http://p.sf.net/sfu/13534_NeoTech_______________________________________________ \
">http://p.sf.net/sfu/13534_NeoTech_______________________________________________</a><br>
 sleuthkit-developers mailing list<br><a \
href="mailto:sleuthkit-developers@lists.sourceforge.net">sleuthkit-developers@lists.sourceforge.net</a><br><a \
href="https://lists.sourceforge.net/lists/listinfo/sleuthkit-developers">https://lists \
.sourceforge.net/lists/listinfo/sleuthkit-developers</a><br></blockquote></div><br></div></div></div></span></div></div></span></body></html>



["mft2_16793.hex" (application/octet-stream)]
["mft2_16793.raw" (application/octet-stream)]
["smime.p7s" (application/pkcs7-signature)]
[Attachment #11 (--===============2993919019387046896==)]
------------------------------------------------------------------------------


_______________________________________________
sleuthkit-developers mailing list
sleuthkit-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sleuthkit-developers


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic