Wordlist utilities - wordlist analysis and statistics


The word-list analyzer collects and shows the following statistics:

Wordlist analysis and statistics

Common information:

  • Dictionary name
  • Size in bytes
  • File type
  • Last modified date and time
  • Whether or not alphabetically sorted (the check takes place only if the file is sorted ascending)

Word statistics:

  • Total words
  • Words with Latin characters only
  • Words with uppercase Latins only
  • Words with lowercase Latins only
  • Words consisting of digits only
  • Words consisting of special characters only
  • Words with non-printable characters
  • Words with non-Latin characters
  • Multi-word phrases, i.e. words separated with space
  • Bytes per word, less word delimiter. Shows an average wordlist compression ratio.
  • Bits per character. Shows a real wordlist compression ratio. For example, in UNICODE the bits per character value tends to 16 (not counting word delimiter), in regular ASCII wordlists - to 8. In some compressed PCD wordlists, one letter can be coded even by less than 1 bit (see the screenshot).
  • Word statistics - how many words consisting of 1, 2, 3, etc. characters.

Character frequency analysis (if the respective option is set):

  • Indicates how frequently a certain character appears in a wordlist