JAVASCRIPT
Validate and Extract URLs from Text
Discover a robust regular expression to validate URL formats and efficiently extract all valid URLs from a block of text in JavaScript, enhancing content processing for links.
function findAndValidateUrls(text) {
const urlRegex = /(https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|https?:\/\/[a-zA-Z0-9]+\.[^\s]{2,}|[a-zA-Z0-9]+\.[^\s]{2,})/g;
const urls = text.match(urlRegex) || [];
return urls;
}
// Examples
const text = "Visit our website at https://www.example.com or check out http://blog.example.org. Also, try example.net for more info.";
console.log(findAndValidateUrls(text));
// Expected: ["https://www.example.com", "http://blog.example.org", "example.net"]
console.log(findAndValidateUrls("Invalid URL example. this is not a url")); // []
How it works: The `findAndValidateUrls` function utilizes a comprehensive regular expression to identify and extract URLs from a given string. The `urlRegex` pattern is designed to capture URLs starting with `http://`, `https://`, `www.`, or even just a domain name followed by a TLD. The `g` flag ensures that all occurrences are found, and `match()` returns an array of all matching URLs, or an empty array if none are found.