JAVASCRIPT

Extracting All Links and HREFs from HTML String

A JavaScript regex solution to parse an HTML string and extract all anchor (<a>) tags along with their href attributes, useful for web scraping or content analysis.

const extractLinksAndHrefs = (htmlString) => {
  const links = [];
  // This regex captures the href attribute and the inner text of <a> tags.
  // It's a simple approach and might not handle complex or malformed HTML perfectly.
  const linkRegex = /<a\\s+(?:[^>]*?\\s+)?href=(["'])(.*?)\\1(?:[^>]*?)?>(.*?)<\\/a>/gi;
  let match;

  while ((match = linkRegex.exec(htmlString)) !== null) {
    links.push({
      href: match[2],
      text: match[3].trim()
    });
  }
  return links;
};

const htmlContent = `
      <p>Visit <a href="https://example.com/page1">Page One</a> or
      <a class="nav-link" href="/local/path/to/page2.html" target="_blank">Page Two</a>.</p>
      <img src="/image.jpg">
      <a href="mailto:[email protected]">Email Us</a>
    `;

console.log(extractLinksAndHrefs(htmlContent));
/* Expected Output:
[
  { href: "https://example.com/page1", text: "Page One" },
  { href: "/local/path/to/page2.html", text: "Page Two" },
  { href: "mailto:[email protected]", text: "Email Us" }
]
*/
How it works: The `extractLinksAndHrefs` function uses a powerful regular expression to iterate through an HTML string and identify all `<a>` (anchor) tags. For each tag found, it extracts both the value of the `href` attribute and the visible inner text of the link. The `g` flag ensures all matches are found, and the `i` flag makes the search case-insensitive. The results are stored in an array of objects, providing a structured way to analyze the links within the HTML.

Need help integrating this into your project?

Our team of expert developers can help you build your custom application from scratch.

Hire DigitalCodeLabs