PYTHON

Parse CSV Line with Quoted Fields Using Regex

Parse a single CSV line into an array of fields, correctly handling commas within double-quoted values using Python's re module.

import re

def parse_csv_line(line):
    # This regex matches either a double-quoted field (allowing escaped quotes like "")
    # or a non-quoted field (anything not a comma).
    # It uses a global flag to find all occurrences.
    field_pattern = re.compile(r'(?:"((?:[^"]|"")*)"|([^,]*))')
    
    # Split by comma outside of quotes, but easier to match fields directly
    fields = []
    for match in field_pattern.finditer(line):
        if match.group(1) is not None: # Quoted field
            fields.append(match.group(1).replace('""', '"'))
        elif match.group(2) is not None: # Unquoted field
            fields.append(match.group(2))
    return fields

# Example usage:
csv_line1 = 'apple,banana,"cherry,grape",date'
print(f"'{csv_line1}' -> {parse_csv_line(csv_line1)}")

csv_line2 = '"first field","second field with ""escaped"" quotes",last'
print(f"'{csv_line2}' -> {parse_csv_line(csv_line2)}")
How it works: This Python snippet demonstrates how to parse a single CSV line, correctly accounting for fields that contain commas, which are typically enclosed in double quotes. The regular expression `(?:"((?:[^"]|"")*)"|([^,]*))` matches either a quoted field (capturing its content, handling escaped double quotes `""`) or an unquoted field. It then iterates through all matches found in the line, appending the extracted and cleaned field values to a list.

Need help integrating this into your project?

Our team of expert developers can help you build your custom application from scratch.

Hire DigitalCodeLabs