stanford chinese segmentor
Tokenizationofrawtextisastandardpre-processingstepformanyNLPtasks.ForEnglish,tokenizationusuallyinvolvespunctuationsplittingandseparationofsomeaffixeslikepossessives.Otherlanguagesrequiremoreextensivetokenpre-processing,whichisusuallycalledsegmentation.TheStanfordWordSegmentercurrentlysup
用户评论
没有安装文件 浪费积分