JAVASCRIPT
Extract Image Source (src) URLs from HTML
Learn how to parse an HTML string with regex to find and extract all `src` attribute values from `<img>` tags for image processing or indexing.
function extractImageSrcs(htmlString) {
const srcs = [];
// Regex to find <img> tags and capture the content of their 'src' attribute
// It looks for <img, then any characters, then src="...", then any characters, then >
// It uses a non-greedy match for content inside src="".
const regex = /<img\s+(?:[^>]*?\s+)?src=["']([^"']*)["'](?:[^>]*?)?>/gi;
let match;
while ((match = regex.exec(htmlString)) !== null) {
if (match[1]) {
srcs.push(match[1]);
}
}
return srcs;
}
const html = `
<p>Some text here.</p>
<img src="https://example.com/image1.jpg" alt="Image One" class="responsive">
<p>Another paragraph.</p>
<img class='inline' alt='Image Two' src='relative/path/to/image2.png'>
<img src="/assets/image3.webp">
<img alt='No Src'>
`;
const imageSources = extractImageSrcs(html);
console.log(imageSources);
// Expected: [ 'https://example.com/image1.jpg', 'relative/path/to/image2.png', '/assets/image3.webp' ]
How it works: This JavaScript function, `extractImageSrcs`, parses an HTML string to find all `<img>` tags. It utilizes a regular expression to specifically capture the value within the `src` attribute of each image tag, supporting both single and double quotes and various other attributes. It then returns an array of these extracted image source URLs.