JAVASCRIPT
Extract All Links (`href`) from HTML Anchor Tags
Discover how to efficiently extract all URL links from `<a>` tags within an HTML string using a simple yet powerful regular expression in JavaScript.
const htmlString = `<a href="https://example.com/page1" class="link">Link 1</a>
<p>Some text</p>
<a href='http://anothersite.org/path/to/resource' id="my-link">Link 2</a>
<a target="_blank" href="/local/page.html">Local Link</a>`;
const hrefPattern = /<a\s+(?:[^>]*?\s+)?href=(?:"(.*?)"|'(.*?)')/g;
const extractedLinks = [];
let match;
while ((match = hrefPattern.exec(htmlString)) !== null) {
// match[1] captures double-quoted href, match[2] captures single-quoted href
extractedLinks.push(match[1] || match[2]);
}
console.log(extractedLinks);
// Expected: [ 'https://example.com/page1', 'http://anothersite.org/path/to/resource', '/local/page.html' ]
How it works: This regex pattern identifies and extracts the `href` attribute values from `<a>` tags in an HTML string. It uses a non-greedy match to capture the URL within either single or double quotes, ensuring proper extraction even when other attributes are present. The `g` flag allows it to find all occurrences in the string, making it useful for scraping links from HTML content.