← Back to all snippets
PYTHON

Extract All URLs from a String

A powerful Python regex pattern to find and extract all valid URLs (HTTP/HTTPS) embedded within a larger text string for content parsing.

import re

def extract_urls(text):
    # Regex to find http(s):// URLs and also www. or domain-only patterns
    url_regex = r'https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|https?:\/\/[a-zA-Z0-9]+\.[^\s]{2,}|[a-zA-Z0-9]+\.[^\s]{2,}'
    urls = re.findall(url_regex, text)
    return urls

# Example Usage:
text_content = "Visit our website at https://example.com or find more info at http://www.anothersite.org/page.html. Also, check out example.net."
found_urls = extract_urls(text_content)
print(found_urls)
# Expected: ['https://example.com', 'http://www.anothersite.org/page.html', 'example.net']
How it works: This Python function utilizes a regular expression to search for and extract all occurrences of valid URL patterns (including 'http://', 'https://', 'www.', and domain-only references) from a given text string. It's highly useful for parsing user-generated content, extracting links from articles, or for web scraping tasks.

Need help integrating this into your project?

Our team of expert developers can help you build your custom application from scratch.

Hire DigitalCodeLabs