how pick contents within alt tag regex
given text:
<a href="gallery.com/gallery-name"; target="_blank"> <img class="aligncenter" src="myblog.com/wp-content/image.jpg " alt=" want text " width=" 400 " height="300" /></a>
how match i want text
i've tried alt=".*"
yields alt=" want text " width=" 400 " height="300"
undesirable.
foreward
you should use html parser this, seem have creative control on source string, , if it's simple edge cases should reduced.
description
<img\s(?=(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\salt=['"]([^"]*)['"]?) (?:[^>=]|='[^']*'|="[^"]*"|=[^'"\s]*)*"\s?\/?>
this regular expression following:
- find image tags
- require image tag have
alt
attribute - capture
alt
attribute value , put capture group 1 - allow value surrounded in single, double, or no quotes
- avoid pretty difficult edge cases make matching html difficult
example
live demo
https://regex101.com/r/cn0ld4/2
sample text
note difficult edge case in second img
tag.
<a href="gallery.com/gallery-name"; target="_blank"> <img class="aligncenter" src="myblog.com/wp-content/image.jpg" alt=" want text" width =" 400 " height="300" /></a> <img onmouseover=' alt="this not droid looking for" ;' class="aligncenter" src="myblog.com/wp-content/image.jpg" alt="this droid i'm looking for." width =" 400 " height="300" />
sample matches
- capture group 0 gets entire
img
tag - capture group 1 gets value in
alt
attribute, not including surrounding quotes
[0][0] = <img class="aligncenter" src="myblog.com/wp-content/image.jpg" alt=" want text" width =" 400 " height="300" /> [0][1] = want text [1][0] = <img onmouseover=' alt="this not droid looking for" ;' class="aligncenter" src="myblog.com/wp-content/image.jpg" alt="this droid i'm looking for." width =" 400 " height="300" /> [1][1] = droid i'm looking for.
explanation
node explanation ---------------------------------------------------------------------- <img '<img' ---------------------------------------------------------------------- \s whitespace (\n, \r, \t, \f, , " ") ---------------------------------------------------------------------- (?= ahead see if there is: ---------------------------------------------------------------------- (?: group, not capture (0 or more times (matching least amount possible)): ---------------------------------------------------------------------- [^>=] character except: '>', '=' ---------------------------------------------------------------------- | or ---------------------------------------------------------------------- =' '=\'' ---------------------------------------------------------------------- [^']* character except: ''' (0 or more times (matching amount possible)) ---------------------------------------------------------------------- ' '\'' ---------------------------------------------------------------------- | or ---------------------------------------------------------------------- =" '="' ---------------------------------------------------------------------- [^"]* character except: '"' (0 or more times (matching amount possible)) ---------------------------------------------------------------------- " '"' ---------------------------------------------------------------------- | or ---------------------------------------------------------------------- = '=' ---------------------------------------------------------------------- [^'"] character except: ''', '"' ---------------------------------------------------------------------- [^\s>]* character except: whitespace (\n, \r, \t, \f, , " "), '>' (0 or more times (matching amount possible)) ---------------------------------------------------------------------- )*? end of grouping ---------------------------------------------------------------------- \s whitespace (\n, \r, \t, \f, , " ") ---------------------------------------------------------------------- alt= 'alt=' ---------------------------------------------------------------------- ['"] character of: ''', '"' ---------------------------------------------------------------------- ( group , capture \1: ---------------------------------------------------------------------- [^"]* character except: '"' (0 or more times (matching amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ['"]? character of: ''', '"' (optional (matching amount possible)) ---------------------------------------------------------------------- ) end of look-ahead ---------------------------------------------------------------------- (?: group, not capture (0 or more times (matching amount possible)): ---------------------------------------------------------------------- [^>=] character except: '>', '=' ---------------------------------------------------------------------- | or ---------------------------------------------------------------------- =' '=\'' ---------------------------------------------------------------------- [^']* character except: ''' (0 or more times (matching amount possible)) ---------------------------------------------------------------------- ' '\'' ---------------------------------------------------------------------- | or ---------------------------------------------------------------------- =" '="' ---------------------------------------------------------------------- [^"]* character except: '"' (0 or more times (matching amount possible)) ---------------------------------------------------------------------- " '"' ---------------------------------------------------------------------- | or ---------------------------------------------------------------------- = '=' ---------------------------------------------------------------------- [^'"\s]* character except: ''', '"', whitespace (\n, \r, \t, \f, , " ") (0 or more times (matching amount possible)) ---------------------------------------------------------------------- )* end of grouping ---------------------------------------------------------------------- " '"' ---------------------------------------------------------------------- \s? whitespace (\n, \r, \t, \f, , " ") (optional (matching amount possible)) ---------------------------------------------------------------------- \/? '/' (optional (matching amount possible)) ---------------------------------------------------------------------- > '>' ----------------------------------------------------------------------