JAVASCRIPT

Extract All Links (`href`) from HTML Anchor Tags

Discover how to efficiently extract all URL links from `<a>` tags within an HTML string using a simple yet powerful regular expression in JavaScript.

const htmlString = `<a href="https://example.com/page1" class="link">Link 1</a>
<p>Some text</p>
<a href='http://anothersite.org/path/to/resource' id="my-link">Link 2</a>
<a target="_blank" href="/local/page.html">Local Link</a>`;

const hrefPattern = /<a\s+(?:[^>]*?\s+)?href=(?:"(.*?)"|'(.*?)')/g;
const extractedLinks = [];
let match;

while ((match = hrefPattern.exec(htmlString)) !== null) {
  // match[1] captures double-quoted href, match[2] captures single-quoted href
  extractedLinks.push(match[1] || match[2]);
}

console.log(extractedLinks);
// Expected: [ 'https://example.com/page1', 'http://anothersite.org/path/to/resource', '/local/page.html' ]
How it works: This regex pattern identifies and extracts the `href` attribute values from `<a>` tags in an HTML string. It uses a non-greedy match to capture the URL within either single or double quotes, ensuring proper extraction even when other attributes are present. The `g` flag allows it to find all occurrences in the string, making it useful for scraping links from HTML content.

Need help integrating this into your project?

Our team of expert developers can help you build your custom application from scratch.

Hire DigitalCodeLabs