[prev in list] [next in list] [prev in thread] [next in thread]
List: xml4lib
Subject: [XML4Lib] regex matching problem
From: Dana Pearson <dbpearsonmlis () GMAIL ! COM>
Date: 2014-07-17 1:16:52
Message-ID: CA+g3ULutk-KKgW2SFyJJv7kizYcJ_-nYCjmMyZYeH0FR=YjooQ () mail ! gmail ! com
[Download RAW message or body]
I am stumped on a regular expression to capture the dimensions of 2
photographs, the original and the 'access image'.
The source has a string which the following 3 examples are representative
of the variation.
<udf21>Scanned on Epson 10000 XL, with Adobe Photoshop. 1100 dpi.
5235x4152 pixels, access image 3000 x1322 pixels.</udf21>
<udf21>Scanned on Epson 10000 XL, with Adobe Photoshop. 3300 dpi.
1795x5100 pixels, access image 982x3000 pixels.</udf21>
<udf21>Scanned on Epson 10000 XL, with Adobe Photoshop. 5070 x 3344
pixels, access image 3000x1979 pixels.</udf21>
The strings are relatively uniform except some do not have a number
followed by 'dpi' and there are sometimes spaces before and/or after the
'x' in the string I'm trying to capture with analyze-string (XSL 2.0).
All dimensions have 4 digits except one (2nd example, access image 982x3000
pixels).
I did not anticipate this being a difficult regex problem but I cannot
find the solution.
The following regex is as close as I can come to matching all 94 instances
of element udf21.
My regex (curly brackets doubled for XPath):
(.*)(\d{{4}}\s?x\s?\d{{4}})(.*)(\d{{3,4}}\s?x\s?\d{{4}})(.*)
Perfect for the second example but all others have only the 2nd, 3rd, 4th
of the 4 digits in this part of the string.
000 x1322
982x3000
000x1979
<xsl:analyze-string
select="."
regex="(.*)(\d{{4}}\s?x\s?\d{{4}})(.*)(\d{{3,4}}\s?x\s?\d{{4}})(.*)">
<xsl:matching-substring>
<subfield code="d"><xsl:value-of select="regex-group(4)"/></subfield>
</xsl:matching-substring>
</xsl:analyze-string>
I have also tried:
(.*)(\d{4}\s?x\s?\d{4})(.*)(\d\d\d\d?\s?x\s?\d{4})(.*)
The result is the same.
I think I have run up against a subtlely of regular expressions beyond my
understanding.
thanks,
dana
--
Dana Pearson
dbpearsonmlis.com
Metadata and Bibliographic Services for Libraries
================================
To unsubscribe: http://bit.ly/xml4lib
XML4Lib Web Site: http://xml4lib.org/
2014-07-16
[Attachment #3 (text/html)]
<div dir="ltr"><div>I am stumped on a regular expression to capture the dimensions of \
2 photographs, the original and the 'access \
image'.</div><div><br></div><div>The source has a string which the following 3 \
examples are representative of the variation.</div> \
<div><br></div><div><udf21>Scanned on Epson 10000 XL, with Adobe Photoshop. \
1100 dpi. 5235x4152 pixels, access image 3000 x1322 \
pixels.</udf21></div><div><br></div><div><udf21>Scanned on Epson 10000 \
XL, with Adobe Photoshop. 3300 dpi. 1795x5100 pixels, access image 982x3000 \
pixels.</udf21></div> <div><br></div><div><udf21>Scanned on Epson 10000 \
XL, with Adobe Photoshop. 5070 x 3344 pixels, access image 3000x1979 \
pixels.</udf21></div><div><br></div><div>The strings are relatively uniform \
except some do not have a number followed by 'dpi' and there are sometimes \
spaces before and/or after the 'x' in the string I'm trying to capture \
with analyze-string (XSL 2.0).</div> <div><br></div><div>All dimensions have 4 digits \
except one (2nd example, access image 982x3000 pixels).</div><div><br></div><div>I \
did not anticipate this being a difficult regex problem but I cannot find the \
solution.</div> <div><br></div><div>The following regex is as close as I can come to \
matching all 94 instances of element udf21. </div><div><br></div><div>My regex \
(curly brackets doubled for \
XPath):</div><div><br></div><div>(.*)(\d{{4}}\s?x\s?\d{{4}})(.*)(\d{{3,4}}\s?x\s?\d{{4}})(.*)</div>
<div><br></div><div>Perfect for the second example but all others have only the 2nd, \
3rd, 4th of the 4 digits in this part of the string.</div><div><br></div><div>000 \
x1322</div><div>982x3000</div><div>000x1979</div><div> \
<br></div><div><xsl:analyze-string</div><div> select="."</div><div> \
regex="(.*)(\d{{4}}\s?x\s?\d{{4}})(.*)(\d{{3,4}}\s?x\s?\d{{4}})(.*)"></div><div><xsl:matching-substring></div><div><subfield \
code="d"><xsl:value-of \
select="regex-group(4)"/></subfield></div> \
<div></xsl:matching-substring></div><div></xsl:analyze-string></div><div><br></div><div>I \
have also tried:</div><div>(.*)(\d{4}\s?x\s?\d{4})(.*)(\d\d\d\d?\s?x\s?\d{4})(.*)</div><div><br></div><div>The \
result is the same.</div> <div><br></div><div>I think I have run up against a \
subtlely of regular expressions beyond my \
understanding.</div><div><br></div><div>thanks,</div><div>dana</div><div><br></div><div><br></div>-- \
<br><div dir="ltr">Dana Pearson<br> <a href="http://dbpearsonmlis.com" \
target="_blank">dbpearsonmlis.com</a><div>Metadata and Bibliographic Services for \
Libraries</div></div> </div>
================================
<p>
To unsubscribe: http://bit.ly/xml4lib
</p><p>
XML4Lib Web Site: http://xml4lib.org/
</p><p>
2014-07-16
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic