I am trying to get the MTOM binary content using a extended class of SoapClient, the response is something like that:
--uuid:8c73f23e-47d9-49fb-a61c-c1df7b19a306+id=2 Content-ID: <http://tempuri.org/0> Content-Transfer-Encoding: 8bit Content-Type: application/xop+xml;charset=utf-8;type="text/xml" <big-xml-here> <xop:Include href="cid:http://tempuri.org/1/636644204289948690" xmlns:xop="http://www.w3.org/2004/08/xop/include"/> </big-xml-here> --uuid:8c73f23e-47d9-49fb-a61c-c1df7b19a306+id=2--
Right after the XML, the MTOM response continue with the binaries related to the "cid" URL:
Content-ID: <http://tempuri.org/1/636644204289948690> Content-Transfer-Encoding: binary Content-Type: application/octet-stream %PDF-1.4 %���� (lots of binary content here) --uuid:7329cfb8-46a4-40a8-b15b-39b7b0988b57+id=4--
To extract everything I've tried this code:
$xop_elements = null; preg_match_all('/<xop[\s\S]*?\/>/', $response, $xop_elements); $xop_elements = reset($xop_elements); if (is_array($xop_elements) && count($xop_elements)) { foreach ($xop_elements as $xop_element) { $cid = null; preg_match('/cid:(.*?)"/', $xop_element, $cid); if(isset($cid[1])){ $cid = $cid[1]; $binary = null; preg_match("/Content-ID:.*?$cid.*?(.*?)uuid/", $response, $binary); var_dump($binary); exit(); } } }
Although the preg_match_all
and the first preg_match
are working, the last one:
/Content-ID:.*?$cid.*?(.*?)uuid/
is not working
On the original source: https://github.com/debuss/MTOMSoapClient/blob/master/MTOMSoapClient.php
the regex is
/Content-ID:[\s\S].+?'.$cid.'[\s\S].+?>([\s\S]*?)--uuid/
but I got an error on PHP 7:
preg_match(): Unknown modifier '/'
Is there a away to get MTOM binary of each CID?
Thanks in advance!
2 Answers
Answers 1
You need to first unquote $cid
as that is causing the your first error
$cid = preg_quote($cid[1], '/');
Next you need to use the s
modifier flag so that .
matches new lines also
preg_match("/Content-ID:.*?$cid.*?(.*?)uuid/s", $response, $binary);
s (PCRE_DOTALL) If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded. This modifier is equivalent to Perl's /s modifier. A negative class such as [^a] always matches a newline character, independent of the setting of this modifier.
Answers 2
As I understand, you are trying to adjust the original code to your modified file SOAP version.
Instead of a number, you want to capture the whole http://tempuri.org/1/636644204289948690
in the $cid
variable (you may want to rename the var). To do so you could use the following regex, that matches everything but a double quote in capture group 1: cid:([^"]+)
preg_match('/cid:([^"]+)/', $xop_element, $cid);
So far, so good. Guessing from your description you should use the following pattern to capture the binary part:
'%Content-ID: <'.$cid.'>([\s\S]*?)--uuid%'
We use a modified dot [\s\S] to match across multiple lines (as shown as well in the original implementation). Otherwise, add the s
|single line flag or (?s)
inline modifier. Also, I use alternative regex delimiters % to avoid escaping problems. It's still sound to use preg_quote($cid[1], '%')
as suggested by Tarun.
Now, you can retrieve the block in question from capture group 1:
trim($binary[1]);
0 comments:
Post a Comment