[prev in list] [next in list] [prev in thread] [next in thread]
List: sleuthkit-users
Subject: Re: [sleuthkit-users] text search in TSK or Autopsy
From: Simson Garfinkel <simsong () acm ! org>
Date: 2008-03-15 3:06:30
Message-ID: 8D0CDF9E-0BD8-4452-BEDA-F281FE16A884 () acm ! org
[Download RAW message or body]
Looks like Word2007 has gone from UTF-16 to UTF-8 for storing objects
in .doc files; find the text with "strings:
07:52 PM imac2:~/domex$
07:52 PM imac2:~/domex$ strings samples/Microsoft_Office2008_Mac/four-
deep.doc | grep Ze
Zebra
07:53 PM imac2:~/domex$ strings samples/Microsoft_Office2008_Mac/four-
deep.doc | grep Lio
Lions
07:53 PM imac2:~/domex$ strings samples/Microsoft_Office2008_Mac/four-
deep.doc | grep Tig
Tigers
07:53 PM imac2:~/domex$ strings samples/Microsoft_Office2008_Mac/four-
deep.doc | grep Bea
Bears
07:53 PM imac2:~/domex$ strings -e l samples/Microsoft_Office2008_Mac/
four-deep.doc | grep Bea
strings: unknown flag: -e
Usage: strings [-] [-a] [-o] [-t format] [-number] [-n number] [[-arch
<arch_flag>] ...] [--] [file ...]
07:53 PM imac2:~/domex$
(I'm using a Mac without the -e option)
But it won't show up with docx files, because they are compressed with
PKZIP:
08:05 PM imac2:~/domex$ strings samples/Microsoft_Office2008_Mac/four-
deep.docx | grep ...........
[Content_Types].xml
_rels/.rels
word/_rels/document.xml.rels
word/document.xml
word/media/image1.png
iCCPICC Profile
word/theme/theme1.xml
docProps/thumbnail.jpeg
ICC_PROFILE
mntrRGB XYZ
Copyright 2007 Apple Inc., all rights reserved.
Generic RGB Profile
Generic RGB Profile
%&'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz
&'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz
word/embeddings/Microsoft_Word_Document1.docx
word/settings.xml
word/webSettings.xml
docProps/app.xml
docProps/core.xml
word/styles.xml
word/fontTable.xml
[Content_Types].xmlPK
_rels/.relsPK
word/_rels/document.xml.relsPK
word/document.xmlPK
word/media/image1.pngPK
word/theme/theme1.xmlPK
docProps/thumbnail.jpegPK
word/embeddings/Microsoft_Word_Document1.docxPK
word/settings.xmlPK
word/webSettings.xmlPK
docProps/app.xmlPK
docProps/core.xmlPK
word/styles.xmlPK
word/fontTable.xmlPK
08:05 PM imac2:~/domex$
(You get the filenames and apparently the JPEG and PNG contents, since
those files aren't being compressed.)
On Mar 14, 2008, at 5:09 PM, Brian Carrier wrote:
> Do you find the text if you use 'strings -e l' and then grep the
> output? That will allow you to look for UTF-16 English text (not
> all UTF-16 though)
>
> brian
>
> On Mar 14, 2008, at 2:28 PM, Simson Garfinkel wrote:
>
>> What open source tools do people on this use to do keyword searching
>> of files in an image? One approach is to just search the raw image
>> and then backtrack to see what file that keyword is in. This works
>> for
>> ascii but doesn't work for encoded data.
>>
>> Here's the problem we are interested in solving: We have a Microsoft
>> Word file that contains an embedded Microsoft Word file that contains
>> the word "Zebra" in unicode. EnCase will find the Zebra; grepping
>> through the disk image will not.
>>
>> wvText, the open source text extraction tool that I use, will not run
>> recursively on embedded OLE objects.
>>
>> Is there an open source tool that will?
>>
>> Thanks!
>>
>> -Simson
>>
>>
>> -------------------------------------------------------------------------
>> This SF.net email is sponsored by: Microsoft
>> Defy all challenges. Microsoft(R) Visual Studio 2008.
>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
>> _______________________________________________
>> sleuthkit-users mailing list
>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users
>> http://www.sleuthkit.org
>
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
sleuthkit-users mailing list
https://lists.sourceforge.net/lists/listinfo/sleuthkit-users
http://www.sleuthkit.org
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic