【文本处理】常用的文本处理过程

发表于 2021-07-18 更新于 2021-09-26 分类于文本处理阅读次数：

本文字数： 472 阅读时长 ≈ 1 分钟

一些常用的文本处理过程总结（不断更新与补充）。

分词

剔除一些无用的符号，词

三种统计词频的方式

参考

直接使用 dict

text = "I'm a handsome boy!"

frequency = {}

for word in text.split():
    if word not in frequency:
        frequency[word] = 1
    else:
        frequency[word] += 1

使用 defaultdict

import collections

frequency = collections.defaultdict(int)

text = "I'm a handsome boy!"

for word in text.split():
    frequency[word] += 1

使用 Counter

import collections

text = "I'm a handsome boy!"
frequency = collections.Counter(text.split())