我想将西班牙语中的áéíóúÁÉÍÓÚñÑüÜ
替换成相应的英文字母,并把¡¿
删除,以便后续其他数据处理操作,所以我写了个小程序替换这些特殊字符。
简单粗暴地遍历句子中所有字符,若有上述特殊字符,就替换成相应的字母。
注意,要在文件最上方加上# -*- coding: ISO8859-1 -*-
,否则会报错,且下一次打开这个文件时特殊符号会乱码。
# -*- coding: ISO8859-1 -*-
sentence = "Juchitán de Za¿ragozaá"
# 替换特殊字符
def replace_char(original_column, pos, point_char):
replaced_column = original_column[:pos] + point_char + original_column[pos+1:]
return replaced_column
# 删除特殊字符
def delete_char(origianal_column, pos):
deleted_column = origianal_column[:pos] + origianal_column[pos+1:]
return deleted_column
# 构建西班牙语特殊字符的字典
special_Spanish_chars_dict = {}
special_Spanish_chars_list = ['á', 'é', 'í', 'ó', 'ú', 'Á', 'É', 'Í', 'Ó', 'Ú', 'ñ', 'Ñ', '¡', '¿', 'ü', 'Ü']
English_chars_list = ['a', 'e', 'i', 'o', 'u', 'A', 'E', 'I', 'O', 'U', 'n', 'N', 0, 0, 'u', 'U']
cnt = 0
for special_char in special_Spanish_chars_list:
special_Spanish_chars_dict[special_char] = English_chars_list[cnt]
cnt += 1
cleaned_sentence = sentence
sentence_length = len(sentence)
delete_num = 0
# 遍历句子中的每一个字符
for j in range(sentence_length):
try:
character = sentence[j]
point_char = special_Spanish_chars_dict[character]
if point_char != 0: # replace
cleaned_sentence = replace_char(cleaned_sentence, j - delete_num, point_char)
else: # delete
cleaned_sentence = delete_char(cleaned_sentence, j - delete_num)
delete_num += 1
except:
continue
print("Original sentence: {}".format(sentence))
print("Cleaned sentence: {}".format(cleaned_sentence))
结果: