JAVASCRIPT
Extract All Link URLs from HTML
A JavaScript regex snippet to efficiently find and extract all `href` attribute values from `<a>` tags within an HTML string for scraping or content processing.
function extractLinkHrefs(htmlString) {
const hrefRegex = /<a\s+(?:[^>]*?\s+)?href=["'](.*?)[ "']/gi;
let matches;
const hrefs = [];
while ((matches = hrefRegex.exec(htmlString)) !== null) {
// matches[1] contains the captured href value
hrefs.push(matches[1]);
}
return hrefs;
}
// Example usage:
// const html = '<p>Hello</p><a href="https://example.com">Link 1</a><a class="btn" href="/page">Link 2</a>';
// console.log(extractLinkHrefs(html)); // ["https://example.com", "/page"]
How it works: This function uses a global, case-insensitive regular expression to iterate through an HTML string, finding all `<a>` tags and capturing the value of their `href` attribute. It then returns an array containing all extracted URLs. This is useful for parsing simple HTML structures.