PYTHON
Remove Script Tags from Input String
A Python regex pattern to effectively sanitize user input by stripping out potentially malicious `<script>` tags, preventing XSS vulnerabilities.
import re
def remove_script_tags(text):
# Regex to find <script> tags and their content, case-insensitive and dotall for multiline
script_tag_regex = re.compile(r'<script\b[^>]*>.*?<\/script>', re.IGNORECASE | re.DOTALL)
sanitized_text = script_tag_regex.sub('', text)
return sanitized_text
# Example Usage:
user_input = "Hello, this is safe.<script>alert('XSS!');</script>And this is too.<script type='text/javascript'>console.log('Malicious');</script>"
cleaned_input = remove_script_tags(user_input)
print(cleaned_input)
# Expected: "Hello, this is safe.And this is too."
How it works: This Python function employs a regular expression to find and remove all `<script>` tags, along with their contents, from an input string. This is a fundamental step in sanitizing user-generated content to prevent Cross-Site Scripting (XSS) attacks by neutralizing embedded malicious JavaScript. It uses `re.IGNORECASE` for case-insensitivity and `re.DOTALL` to match newlines within script content.