regex - Regexp, skip nested pairs -


In my own markup language, I have a quotation tag & gt; & Gt; Who use these characters to create blockquote & lt; & Lt; . The problem begins when nested blockquotes occur:

  & gt; & Gt; (1) start1 & gt; & Gt; (2) quote 2! & Lt; & Lt; (3) & lt; & Lt; (4)   

I would only like to match the outermost tag, such as:

  & lt; Blockquote & gt; Start1 & gt; & Gt; Speech 2! & Lt; & Lt; & Lt; / Blockquote & gt;   

If I have a simple unorganized regex /> and gt; (. +?) & Lt; & Lt; / , (1) and (3) will try to match and (2) and (4) will never be matched if I make it unconditional then />

  & gt; & Gt; (A) Bid 1 & lt; & Lt; (B) & gt; & Gt; (C) Bid 2 & lt; & Lt; (D)   

Greedy will match the one (A) to (D), (B) and (C) leaving alone. I think I would have to make it "irreversible" in some way, but only if there is no other pair "", which is beyond my skills, is it a way to work properly? Then (1) match (4), (a) matches (b) and (c) match (d)? If you can think of a non-reggae solution (but not parsed) then it would be great for me too. I am not asking (2) how to match (3), how to successfully quit them (or any other nested pairs)

Success! Arjen suggests, in the end, I used to build this type of work (not necessarily work:

  $ text = str_replace ('([^ & gt;] | ^) Gt; ([^ & gt;] | $) $$ Len = strlen ($ text); $ text = preg_replace_callback ('/>> ([*] $ 1 and $; ^ & gt;] + ?) & Lt; & lt; / ',' blockHashFunction ", $ text);}   

I first encode all the singles and then a recursive preg_replace. Hashing in the case means that gt;> is replaced by the "asdsad" , for example "\ xFE: 3: \ xFE" end of script In this correct & lt; blockquote> gt; asdsad Regular expressions for this type of parsing are not really compatible. In fact, there are some regx engines that have nested / balanced mailings. For example, the .NET Framework Regx engine (see :). However, I think it leads to a very complex pattern.

If you make a regular expression that matches a beginning or end tag and manually creates a tree of all matches, then you are better. After processing the entire string, you can discard unwanted matches from the resulting archive.

Comments