Python remove non-latin textlines in csv
I have a csv file which contains text in form of strings. Some text lines
are for example in chinese or russian.
What I want to do is use Python to count the number of unicode and ASCII
characters in the text line. If the ratio of ASCII to Unicode characters
is over 90% I want to keep the line and if not remove it from the csv.
The idea behind this is to remove all non-latin languages but keep for
example the german Umlauts, for this I want to use solution with the
ratio.
Has anyone an idea to solve this task?
Thank you very much!
No comments:
Post a Comment