php - How to get the contents inside alt? -


how pick contents within alt tag regex

given text:

<a href="gallery.com/gallery-name"; target="_blank"> <img class="aligncenter" src="myblog.com/wp-content/image.jpg " alt=" want text " width=" 400 " height="300" /></a>

how match i want text

i've tried alt=".*" yields alt=" want text " width=" 400 " height="300" undesirable.

foreward

you should use html parser this, seem have creative control on source string, , if it's simple edge cases should reduced.

description

<img\s(?=(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\salt=['"]([^"]*)['"]?) (?:[^>=]|='[^']*'|="[^"]*"|=[^'"\s]*)*"\s?\/?>

regular expression visualization

this regular expression following:

  • find image tags
  • require image tag have alt attribute
  • capture alt attribute value , put capture group 1
  • allow value surrounded in single, double, or no quotes
  • avoid pretty difficult edge cases make matching html difficult

example

live demo

https://regex101.com/r/cn0ld4/2

sample text

note difficult edge case in second img tag.

<a href="gallery.com/gallery-name"; target="_blank"> <img class="aligncenter" src="myblog.com/wp-content/image.jpg" alt=" want text" width =" 400 " height="300" /></a>  <img onmouseover='  alt="this not droid looking for" ;'  class="aligncenter" src="myblog.com/wp-content/image.jpg" alt="this droid i'm looking for." width =" 400 " height="300" /> 

sample matches

  • capture group 0 gets entire img tag
  • capture group 1 gets value in alt attribute, not including surrounding quotes
[0][0] = <img class="aligncenter" src="myblog.com/wp-content/image.jpg" alt=" want text" width =" 400 " height="300" /> [0][1] =  want text  [1][0] = <img onmouseover='  alt="this not droid looking for" ;'  class="aligncenter" src="myblog.com/wp-content/image.jpg" alt="this droid i'm looking for." width =" 400 " height="300" /> [1][1] = droid i'm looking for. 

explanation

node                     explanation ----------------------------------------------------------------------   <img                     '<img' ----------------------------------------------------------------------   \s                       whitespace (\n, \r, \t, \f, , " ") ----------------------------------------------------------------------   (?=                      ahead see if there is: ----------------------------------------------------------------------     (?:                      group, not capture (0 or more                              times (matching least amount                              possible)): ----------------------------------------------------------------------       [^>=]                    character except: '>', '=' ----------------------------------------------------------------------      |                        or ----------------------------------------------------------------------       ='                       '=\'' ----------------------------------------------------------------------       [^']*                    character except: ''' (0 or more                                times (matching amount                                possible)) ----------------------------------------------------------------------       '                        '\'' ----------------------------------------------------------------------      |                        or ----------------------------------------------------------------------       ="                       '="' ----------------------------------------------------------------------       [^"]*                    character except: '"' (0 or more                                times (matching amount                                possible)) ----------------------------------------------------------------------       "                        '"' ----------------------------------------------------------------------      |                        or ----------------------------------------------------------------------       =                        '=' ----------------------------------------------------------------------       [^'"]                    character except: ''', '"' ----------------------------------------------------------------------       [^\s>]*                  character except: whitespace (\n,                                \r, \t, \f, , " "), '>' (0 or more                                times (matching amount                                possible)) ----------------------------------------------------------------------     )*?                      end of grouping ----------------------------------------------------------------------     \s                       whitespace (\n, \r, \t, \f, , " ") ----------------------------------------------------------------------     alt=                     'alt=' ----------------------------------------------------------------------     ['"]                     character of: ''', '"' ----------------------------------------------------------------------     (                        group , capture \1: ----------------------------------------------------------------------       [^"]*                    character except: '"' (0 or more                                times (matching amount                                possible)) ----------------------------------------------------------------------     )                        end of \1 ----------------------------------------------------------------------     ['"]?                    character of: ''', '"' (optional                              (matching amount possible)) ----------------------------------------------------------------------   )                        end of look-ahead ----------------------------------------------------------------------   (?:                      group, not capture (0 or more times                            (matching amount possible)): ----------------------------------------------------------------------     [^>=]                    character except: '>', '=' ----------------------------------------------------------------------    |                        or ----------------------------------------------------------------------     ='                       '=\'' ----------------------------------------------------------------------     [^']*                    character except: ''' (0 or more                              times (matching amount                              possible)) ----------------------------------------------------------------------     '                        '\'' ----------------------------------------------------------------------    |                        or ----------------------------------------------------------------------     ="                       '="' ----------------------------------------------------------------------     [^"]*                    character except: '"' (0 or more                              times (matching amount                              possible)) ----------------------------------------------------------------------     "                        '"' ----------------------------------------------------------------------    |                        or ----------------------------------------------------------------------     =                        '=' ----------------------------------------------------------------------     [^'"\s]*                 character except: ''', '"',                              whitespace (\n, \r, \t, \f, , " ") (0                              or more times (matching amount                              possible)) ----------------------------------------------------------------------   )*                       end of grouping ----------------------------------------------------------------------   "                        '"' ----------------------------------------------------------------------   \s?                      whitespace (\n, \r, \t, \f, , " ")                            (optional (matching amount                            possible)) ----------------------------------------------------------------------   \/?                      '/' (optional (matching amount                            possible)) ----------------------------------------------------------------------   >                        '>' ----------------------------------------------------------------------