PYTHON
Sanitizing HTML Tags from User Input Using Regex in Python
Clean user-submitted text by removing unwanted HTML tags with a Python regular expression, preventing basic XSS vulnerabilities and ensuring plain text content.
import re
def remove_html_tags(text):
clean = re.compile('<.*?>')
return re.sub(clean, '', text)
html_input = "Hello <b>world</b>! This is a <script>alert('dangerous!');</script> example."
clean_output = remove_html_tags(html_input)
print(clean_output) # Output: Hello world! This is a example.
How it works: This Python snippet defines `remove_html_tags` to strip all HTML tags from a given string. It uses the `re.compile` and `re.sub` functions to find and replace any sequence enclosed in angle brackets (`<.*?>`) with an empty string, effectively sanitizing the input from basic HTML for display or storage.