© Copyright 2010 All Rights Reserved
The information on these pages is copyrighted by the author with all rights reserved. Reproduction of anything other than actual URL addresses without the author's permission is in violation of copyright laws.
Over time, I've been making a switch to using programming editors more often than word processors. A programming editor can be faster and far more customizable. What I miss with this switch is some of the nicer writing tools word processors and desktop publishers offer, such as a spell-checker, a thesaurus, word and line counting and grammar checking. Since you can customize a programming editor, there's no reason you can't call other programs to offer this functionality in conjunction with an editor. I'll discuss some of the tools out there that you can use stand-alone or in other editing programs to to improve your documents.
Further information and links to some of the writing tools discussed below can be found on my Writer's Resources page in the Writing Tools section.
If you'd like to recommend some of your own writing tools or brainstorm other ways to help improve your documents, you're welcome to contact me via the WStorm mailing list.
Table of Contents:
To the main page.
Sometimes you need a word count. A magazine or other professional publication may ask for an article to be under a specific amount of words. While most word processors have a way of getting word counts and line counts, you may not find that feature in your typical programming editor. You can use a program like wc to list word counts, line counts and number of characters in your file. The wc program can be run stand-alone or called from a programming editor and given the current filename to evaluate. To run wc for word, line and character counts, the command syntax would be similar to:
wc -m -l -w filename
You can find copies of wc on most POSIX machines. For Windows users, there's a copy available with msys which is part of mingw. I also have more information on finding a version of wc on my Writer's Resources page.
Some tools will autocorrect spelling errors as you're typing. There's a wonderful and very powerful utility on Windows called Autohotkey that can automate common actions such as keystrokes and mouse clicks. Among the many things it can do, I found some scripts to autocorrect your spelling. The idea behind the scripts is to collect the most common spelling errors and correct them as you type. Are you always misspelling a certain word? You can add it to the script. If you have the script on and running with AutoHotkey, you can use it with any program, an e-mail client, a browser or even a programming editor. You can find more details on the topic here:
You can find links to 2 autocorrection scripts for AutoHotkey here:
Unfortunately, there is no port of Autohotkey to other platforms. However, some industrious Linux/BSD users might want to look into using a program such as xdotool to do something similar.
My favorite general purpose dictionary, thesaurus, encyclopedia program is StarDict. When I wanted to integrate a spelling checker with an editor, my first thought was to investigate working with this very useful tool. If your programming editor allows you to look up the word under your cursor or a word you highlighted, you can call StarDict and send it the word. Some programming editors offer an extensible Help system or Help command that can be used in this way. You can also use StarDict's Scan option. When this is turned on, placing the mouse over words in a sentence one at a time will bring up any entries in selected dictionaries that correspond.
StarDict comes with a number of dictionaries, thesauri and encyclopedias. You can even write your own reference material for StarDict as long as it fits the format of a word or phrase plus a description. Two things I've been unable to find that would fit the StarDict format well are a rhyming dictionary and a style manual. If anyone runs across these, please let me know. You can contact me via the wstorm mailing list. I'd even be interested in finding text or XML versions of these types of references that could be converted to StarDict format. I did try writing a very short style manual of my own. I turned it on as the only reference for StarDict to search and ran my mouse over a line of text in Scan mode. It spotted any words that might be a style issue. It's not as good as a full-fledged grammar checker, but it is an option. You can write custom reference materials (dictionaries) for StarDict using Babylon format (a simple text format for records). There's a utility in the StarDict tools collection that's part of the StarDict project at the Google Code site that can be used to convert the file to StarDict's expected dictionary format.
Below is a sample entry from my style manual in Babylon format. You can contact me on the wstorm mailing list, if you'd like a copy of the other 11 entries. The first line of an entry has the word(s) you want to look up. The second line has the description. Separate the entries in your Babylon file with one blank line between each, including at the end of the file. Sample entry:
it's|its Use it's when you can replace it in a sentence with the words it is.<br>Its is the possesive form of the pronoun it.
StarDict works on a variety of platforms. It's available at the Google Code site. There's also a portable apps version available from the same project.There is a command line version (sdcv) at Sourceforge that works on Windows and POSIX systems with the appropriate patches installed. The StarDict program uses the GTK+ screen library which lets users customize look and feel through themes for all GTK+ based programs on a computer.
Most programming editors have some type of system to display the errors from a compiler in an output window and jump to the file and line where an error occurred when you click on or select the error message. In order to work with this technique, you need a program that can send errors to stdout and that uses an error syntax the editor recognizes. A common error syntax is that of the GNU gcc compiler. My plan to integrate a checking programs with an editor was to find programs that could output to stdout and put messages in the GNU gcc compiler error syntax. Such programs can also be used from a command line if needed.
I did not find any document checking programs currently outputting to that particular format. However, there was no reason I couldn't find a library to adapt and create a program that would. I wanted a C/C++ library so the finished tool would work quickly on older machines. I also wanted something that was easy to adapt or had at least had a decently documented API.
Originally, I had no luck with my search for decent checker libraries. Inspired by the AutoHotkey autocorrection script, I wondered how hard it would be to write a spell-checker like that. Using Perl, I wrote a quick and dirty script to correct by exception (rather than show all words not found in a dictionary file). I ended up adding my Perl script to my programming editor's menu and let it check for commonly misspelled words in whatever file I had open.
There are 3 files to my spell-checker. There's a misspelling file which lists common errors. There's the Perl file that does a search in the current text document for every word in the misspelling file. There's also a misspelling results (or reason) file that tells what the correction is and why the original may be wrong. The Perl script simply searches for strings, so you can search for common misspellings, but you can also search for common grammar errors. You can add your own custom misspellings/errors to the files, so if there's a string you or someone you know often mistypes, the Perl script can catch it. You do have to be careful not to be too general in your misspellings. For instance, flagging 'an' to check if it has been misused in place of 'a' could flag every word with an in it. You can also add spaces to try to catch strings that are parts of full words (although that won't work with words separated by line breaks or punctuation instead of spaces). It's not as fancy as a full-fledged spelling or grammar checking program, but I do find it useful. If anyone's interested, I can upload the source for the 3 files.
By looking at what other Open Source word processors used to do their spell-checking, I did eventually find a useable library. There were some drawbacks to it, so when I more recently found a second library that better fit my purpose, I made the switch to it.
This is the second library I found and I highly recommend it. While Hunspell didn't have the GNU error output style feature, it did offer output to stdout in various formats including one compatible with programs like sed. I decided to add some code to allow output in the format I needed. I posted the patch on the bug list at Sourceforge and was very thrilled that one of the maintainers not only added the change, but improved on it to make it easier to work via the command line. I was able to get Hunspell to compile on Windows using mingw with only a few minor modifications. I sent that change in as well as part of the patch. So, if you get a copy of Hunspell based on the code currently in csv or a version newer than 1.2.9, you'll have a cross-platform spell-checking program ready to integrate with any programming editor that can handle compiler error output.
To run Hunspell or integrate it with your programming editor, use the command listed below. Give the filename to spell-check in place of %1. If you need another language, then switch the dictionary file to the one you want. Make sure hunspell and the dictionary file are in your path or add the appropriate file paths to your command line.
hunspell -d en_US -u3 %1
If you have a document in another format (such as HTML), you probably aren't going to want your spell-checker to flag markup language or programming language commands. Some spell-checkers offer filters to ignore this part of a document. One feature I really like about Hunspell is that it does offer this type of filtering. To spell-check a HTML document, simply add -H to the Hunspell command line. Check the help (--help) for other filter options.
I've found Hunspell to be a very well-designed program that easily builds cross-platform on various systems. No wonder I've seen so many reports of other programs (including LibreOffice and OpenOffice) now using this as their spell-checker of choice.
If you're having trouble finding a Windows version of Hunspell, you can try the Open Suse Build System cross-compiler build. You'll have to download appropriate rpm files. You can unpack the rpm files and the cpio files inside them with 7zip. When done, move the needed Hunspell related files (such as exe and dlls) to the directory you want to run from. You'll also need at least one dictionary available from the Hunspell site on Sourceforge.
The first C spelling library I located was used by Abiword and the lyx desktop publishing program at the time. They did use it in a cross-platform manner. I thought that it might make a good option for my purposes as well. It did not compile out of the box with mingw on Windows. I did eventually manage to get version 0.60.6 to compile. Figuring others would need the patches for a Windows version, I attempted to contact the software maintainer. I found out the maintainer is in no way interested in adding or maintaining a Windows port. The maintainer did leave it up to other projects such as lyx if they wanted to maintain their own ports or forks for other platforms such as Windows. There is a filter option to filter out programming language syntax and html markup language commands with aspell, but I could find no documentation for the library as to how to work with the feature. For all those reasons, I recommend going with Hunspell as one's spell-checker of choice to use with a programming editor or for other purposes.
If you still want to try to plug ahead with using Aspell as a spell-checker library, I'll detail the steps I took to create a working spell-check program, aspellstdout, that can output spell-checker information to stdout in GNU gcc compiler format. What's interesting to me is that I had mentioned my aspellstdout spell-checker program to one of the developers of the Geany programming editor in case their users might like to work with the utility as well. They ended up creating their own spell-checking plug-in for Geany just after that using the aspell library (although I never saw any mention of where they got the idea or the basis of their code from). You still might be able to spot some of the similarities between their code and mine.
To build aspell-0.60.6 on Windows, you'll need patches.
You can then build aspellstdout on any system you can build the aspell library, following the build steps given on my patches page. The source code for aspellstdout is also available there.
Dictionaries and other reference files take up a lot of space. It would be nice on low resource systems if these could be minimized as much as possible. I try to use just one dictionary program of choice on older machines and at the moment, my choice would be Hunspell. Lots of programs already make use of it and I'm hoping more will do so in the future. It would also be convenient if there was some way to further integrate or allow some interoperability between StarDict and Hunspell style file formats. Maybe someone can come up with a way for a spell-checking program like Hunspell to use or convert files from StarDict or xdxf format so that those dictionaries could be re-used as well.
I had thought about writing my own word/line counter, spelling, grammar checker and format the output similar to GNU gcc compiler error format, so the error messages would be reusable in most programming editors. I already have the lexical analyzer, parsing routines. The lexical analyzer could handle the word and line counting. The dictionary could be in XML (xdxf) format using indexing to speed up look-up. A lot of the new No SQL style databases are doing simple keyword and value lookups just like this would need. It could contain correct spellings and parts of speech to use with the grammar parser. The hardest part would be finding a decent, simplified representation of the English (or any other) language in BNF or a similar augmented format. If anyone runs across BNF (or augmented) representations for spoken languages such as English, please let me know. You can contact me through the wstorm mailing list. However, the process would still take a good deal of time to write and I've been wanting something I could use now. So, as mentioned above, I've been looking for other tools to fill in these gaps in the meantime. The largest gap I still have is with grammar checking.
You can use some simple grammar checking capabilities, with tools like StarDict or my Perl spell-checker as mentioned above. The other option is to find a C/C++ library for grammar checking similar to what I did for spell-checking and customize the output as required. I saw mention of one word processor using the Link grammar program. That might make a good option. I've run across one other C/C++ program for grammar checking at Sourceforge, but it was too complicated to get to build easily on the platforms I needed it. If anyone does find a good library for grammar checking, feel free to contact me on the wstorm mailing list and let me know about it. I'd be willing to build another interface like aspellstdout for it if it's useable.
Though I use it more for programming or designing HTML pages, I often need to know how a document has changed over time. If I have the current version of an ASCII text document and an earlier version (such as a backup), I can compare the two. There are standard utilities on most operating systems such as fc and diff for doing so. However, they don't always make the differences between two files clear and apparent and they don't work well with long lines of text. The program diffh, available at Sourceforge, has become one of my favorite utilities and an indispensable file comparison tool. There's also a pre-compiled Windows version of diffh available. It does make use of the diff program found on POSIX machines. If you're on Windows, you can get a copy with msys which is part of mingw or from the gnuwin32 archives at Sourceforge. The diff program actually does the work of comparing the two files. Diffh's job is to make the output easily readable and it does so by converting the information from diff to HTML that is viewable in a standard web browser.
Since several programs are involved in order to see the final output, I found it easier to work with (and there's less typing involved) if I put all the commands in a batch or shell file and call that file when I need to make file comparisons. The script can be used stand-alone from a command line. However, you can also integrate it with a programming editor. All you need to do is pass the name of the file currently open in the programming editor and whatever file you want to compare it with to the script. Below is a sample batch file for invoking the tools on Windows. You can do something very similar for a bash shell script. Include the name of the browser you want to use to view the scitediff.html file if you're not on Windows. Make sure these tools and files are in your path or add the appropriate file paths to your command lines.
diff -u %1 %2 | diffh > scitediff.html scitediff.html
Be sure to check my Writer's Resources page for other useful stand-alone writing tools such as style and diction. There are also links to many of the resources I mentioned above.
To the main page.