I haven’t posted anything in a couple of weeks because I have been working on my new project.  It is a Chinese segmentation tool that divides any Chinese text into words and analyzes them for frequency and difficulty.  It also generates flashcards from the generated word list and let’s the user select which cards to download and save.  Finally, it allows for reading texts with annotated definitions.

At the moment, I’m only rolling out the Pre-Alpha version.  In this version, 读者 (DuZhe) Text Analyzer segments and analyzes the text for difficulty.

It generates three (3) different scores.  The first score is the overall Difficulty Score, which represents how the piece of text ranks on the scale of 1 to 10.  The algorithm for predicting the difficulty is still being tweaked, so I won’t be disclosing it yet.  However, I will discuss the algorithm in a later post.

The second score is the Old HSK difficulty score; it ranks the frequency distribution of the words in the text as compared to the typical HSK test passages.  The scale is from 0 to 4, representative of the HSK levels.

The third score is similar to the Old HSK score, but it is designed for the New HSK.  This score also is derived by comparing the word frequency distribution to a typical New HSK test passage.  The scale is from 0 to 6.

You can experiment with the Pre-Alpha version here.

读者 (DuZhe) Text Analyzer: http://duzhe.aaginskiy.com