JAVASCRIPT
Sanitizing Input: Removing HTML Tags with Regex
Strip HTML tags from user-submitted content to prevent potential XSS vulnerabilities and clean text for display using a simple regular expression in JavaScript.
function stripHtmlTags(htmlString) {
const htmlTagRegex = /<[^>]*>?/gm;
return htmlString.replace(htmlTagRegex, '');
}
// Examples:
const dirtyHtml = "<p>This is <b>some</b> <i>HTML</i> content with <script>alert('XSS');</script> evil code.</p>";
const cleanText = stripHtmlTags(dirtyHtml);
console.log(cleanText); // "This is some HTML content with evil code."
const anotherDirtyHtml = "Line 1<br>Line 2<img src='x' onerror='alert("broken")'>";
const anotherCleanText = stripHtmlTags(anotherDirtyHtml);
console.log(anotherCleanText); // "Line 1Line 2"
How it works: This snippet provides a function to remove HTML tags from a string using a regular expression. The `htmlTagRegex` matches any sequence starting with '<' and ending with '>', optionally including the '?' for non-greedy matching. The `replace()` method replaces all matched tags with an empty string, effectively sanitizing the input by stripping HTML and preventing basic XSS attacks. It's important to note this is a basic sanitization and not a complete XSS prevention mechanism.