[prev in list] [next in list] [prev in thread] [next in thread] 

List:       sleuthkit-users
Subject:    Re: [sleuthkit-users] text search in TSK or Autopsy
From:       Simson Garfinkel <simsong () acm ! org>
Date:       2008-03-15 3:06:30
Message-ID: 8D0CDF9E-0BD8-4452-BEDA-F281FE16A884 () acm ! org
[Download RAW message or body]

Looks like Word2007 has gone from UTF-16 to UTF-8 for storing objects  
in .doc files; find the text with "strings:

07:52 PM imac2:~/domex$
07:52 PM imac2:~/domex$ strings samples/Microsoft_Office2008_Mac/four- 
deep.doc | grep Ze
Zebra
07:53 PM imac2:~/domex$ strings samples/Microsoft_Office2008_Mac/four- 
deep.doc | grep Lio
Lions
07:53 PM imac2:~/domex$ strings samples/Microsoft_Office2008_Mac/four- 
deep.doc | grep Tig
Tigers
07:53 PM imac2:~/domex$ strings samples/Microsoft_Office2008_Mac/four- 
deep.doc | grep Bea
Bears
07:53 PM imac2:~/domex$ strings -e l samples/Microsoft_Office2008_Mac/ 
four-deep.doc | grep Bea
strings: unknown flag: -e
Usage: strings [-] [-a] [-o] [-t format] [-number] [-n number] [[-arch  
<arch_flag>] ...] [--] [file ...]
07:53 PM imac2:~/domex$

(I'm using a Mac without the -e option)

But it won't show up with docx files, because they are compressed with  
PKZIP:

08:05 PM imac2:~/domex$ strings samples/Microsoft_Office2008_Mac/four- 
deep.docx  | grep ...........
[Content_Types].xml
_rels/.rels
word/_rels/document.xml.rels
word/document.xml
word/media/image1.png
iCCPICC Profile
word/theme/theme1.xml
docProps/thumbnail.jpeg
ICC_PROFILE
mntrRGB XYZ
Copyright 2007 Apple Inc., all rights reserved.
Generic RGB Profile
Generic RGB Profile
%&'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz
&'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz
word/embeddings/Microsoft_Word_Document1.docx
word/settings.xml
word/webSettings.xml
docProps/app.xml
docProps/core.xml
word/styles.xml
word/fontTable.xml
[Content_Types].xmlPK
_rels/.relsPK
word/_rels/document.xml.relsPK
word/document.xmlPK
word/media/image1.pngPK
word/theme/theme1.xmlPK
docProps/thumbnail.jpegPK
word/embeddings/Microsoft_Word_Document1.docxPK
word/settings.xmlPK
word/webSettings.xmlPK
docProps/app.xmlPK
docProps/core.xmlPK
word/styles.xmlPK
word/fontTable.xmlPK
08:05 PM imac2:~/domex$


(You get the filenames and apparently the JPEG and PNG contents, since  
those files aren't being compressed.)

On Mar 14, 2008, at 5:09 PM, Brian Carrier wrote:

> Do you find the text if you use 'strings -e l' and then grep the  
> output?  That will allow you to look for UTF-16 English text (not  
> all UTF-16 though)
>
> brian
>
> On Mar 14, 2008, at 2:28 PM, Simson Garfinkel wrote:
>
>> What open source tools do people on this use to do keyword searching
>> of files in an image?  One approach is to just search the raw image
>> and then backtrack to see what file that keyword is in. This works  
>> for
>> ascii but doesn't work for encoded data.
>>
>> Here's the problem we are interested in solving: We have a Microsoft
>> Word file that contains an embedded Microsoft Word file that contains
>> the word "Zebra" in unicode. EnCase will find the Zebra; grepping
>> through the disk image will not.
>>
>> wvText, the open source text extraction tool that I use, will not run
>> recursively on embedded OLE objects.
>>
>> Is there an open source tool that will?
>>
>> Thanks!
>>
>> -Simson
>>
>>
>> -------------------------------------------------------------------------
>> This SF.net email is sponsored by: Microsoft
>> Defy all challenges. Microsoft(R) Visual Studio 2008.
>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
>> _______________________________________________
>> sleuthkit-users mailing list
>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users
>> http://www.sleuthkit.org
>


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
sleuthkit-users mailing list
https://lists.sourceforge.net/lists/listinfo/sleuthkit-users
http://www.sleuthkit.org
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic