读者 (DuZhe) has reached a point where it is fairly usable, but it’s probably a bit difficult to use without a decent guide so I am writing a guide to help the early adapters.  The analyzer comes with 3 basic options for text analysis: Read, Analyze, and Segment.

Input Screen

Input Screen

Let’s get started by inserting some Chinese text to analyze.  If you don’t have any readily available, check out 盗墓笔记, and copy the first chapter here.

Segment

This option is very straight forward.  It simply returns the input text with spaces between the words. Such as for the example above it will return something like shown below.  It won’t display any definitions or save word lists.  It’s only for segmenting the text.

Text Segmentation

Segment Option

Analyze

This option is a whole lot more interesting than the previous one.  It judges the text for the perceived difficulty in relation to a typical HSK text of an equivalent level.  I’m still ironing out the kinks in the formula to calculate the difficulty level, so I’m not ready to disclose the details, but stayed tune.  Getting to the analysis page, you’ll be greeted with an over all difficulty score.

Difficulty Score

Difficulty Score

Below the difficulty score, you can see an analysis of the word frequencies.  Next is the break down of the HSK words in the text, both new and old.  The respective scores represent how well this text matches up with the difficulty of the HSK.

HSK Scores

HSK Scores

Finally, on this page, you can see a list of all the words in the text.  The table of the words is paginate and can be filtered and searched.

Word List

Word List Table

Each row is selectable and the selected rows can be downloaded as a CSV file.

Download the Word List
Download the Word List

Read

This is where there real fun starts.  The Read section let’s you read the submitted text with the help of a mouseover dictionary.  The words are already pre-segmented, so there is no need to highlight or select words, just mouseover.

Mouseover Words for Definition

Mouseover Words for Definition

A cool feature about it is that the definition is always displayed on the bottom in a single row, but if a single row is not enough to show the whole definition, arrows on the left of the definition appear to indicate that the definition can be expended by clicking the word.

Expandable Definition

Expandable Definition

Clicking the word will also bring up a menu that lets you add the word to a word list or flag it as inaccurate.  At the moment flagging words won’t do anything because I disabled it until I can get a better editor in place.

Add Word to List

Add Word to List

The words will be added to a list that can be viewed from the menu accessible from the button next to the definition.

Show the Word List

Show the Word List

This will pull up a table of all the words added to the list.  The words can be removed by hitting the ‘x’ button in each respective row.  The whole word list can be downloaded by click the down arrow.

 

Selected Word List

Selected Word List

This is pretty much all to it at the moment.  The functionality, overall, is very straight forward.

Let me know how it works out for you and leave a comment!  I’m looking forward to any suggestion and bug reports.

Tagged with:
 
  • http://buchmann.info Peter Buchmann

    This looks really interesting, congratulations! Is it online already?

    Also thank you for sharing to RegEx code.

    I would be curious to know what kind of dictionary files and such you used to build this application.

    Regards from Zurich, Switzerland

    Peter

  • http://www.aaginskiy.com Artem Aginskiy

    It’s up at duzhe.aaginskiy.com but it’s in alpha stage, so it has some bugs. I’m working on it but the progress is somewhat slow due to it being a side project during my free time.

    I used the CC-CEDICT for the dictionary, the regex will help parse that dictionary.