PYTHON
Parse CSV Line with Quoted Fields Using Regex
Parse a single CSV line into an array of fields, correctly handling commas within double-quoted values using Python's re module.
import re
def parse_csv_line(line):
# This regex matches either a double-quoted field (allowing escaped quotes like "")
# or a non-quoted field (anything not a comma).
# It uses a global flag to find all occurrences.
field_pattern = re.compile(r'(?:"((?:[^"]|"")*)"|([^,]*))')
# Split by comma outside of quotes, but easier to match fields directly
fields = []
for match in field_pattern.finditer(line):
if match.group(1) is not None: # Quoted field
fields.append(match.group(1).replace('""', '"'))
elif match.group(2) is not None: # Unquoted field
fields.append(match.group(2))
return fields
# Example usage:
csv_line1 = 'apple,banana,"cherry,grape",date'
print(f"'{csv_line1}' -> {parse_csv_line(csv_line1)}")
csv_line2 = '"first field","second field with ""escaped"" quotes",last'
print(f"'{csv_line2}' -> {parse_csv_line(csv_line2)}")
How it works: This Python snippet demonstrates how to parse a single CSV line, correctly accounting for fields that contain commas, which are typically enclosed in double quotes. The regular expression `(?:"((?:[^"]|"")*)"|([^,]*))` matches either a quoted field (capturing its content, handling escaped double quotes `""`) or an unquoted field. It then iterates through all matches found in the line, appending the extracted and cleaned field values to a list.