Dictation for structured writing

I enjoy writing about how to do more things, more easily with DITA. After many years in the structured writing field, I've come to appreciate the versatility and usability of the validating editors that support DITA so well. But I've been surprised recently in rediscovering the usefulness of dictation for initial content input for structured document formats--DITA tasks in particular.

This journey began last week when I investigated the use of Twitter as a means of initial content drafting for material intended to result in a DITA document. That post recounts my bemused observation that the most effective way to push chunks of knowledge into structured content was with a dictation-like approach in which I used command to indicate WHAT the input is to the receiving system. When you say "new step", you implicitly indicate several things: The associated text belongs in a task, in a <steps> section, and the first sentence of your related dictation will be the content of a <cmd> element. Anything that follows as regular text will flow into a successive <info> element. You are done with this step when you announce either "new step" or "new results" or "stop". There is logic in the command interpreter to perform this implicit boundary checking, much like the content validation rules in an XML editor.

Here is an example of a dialog that has enough information to construct a completely valid DITA task using the input provided:

;start task
;new title.Dictating a DITA task
;new paragraph.This is implicitly a short description.
There is a "new abstract" directive if you care to be precise, but
the system can figure out that this could be a shortdesc.
;new paragraph.The body of a task can't have a paragraph, so
the system can infer that this paragraph belongs in a context
section of the task.
;new step.Open a text editor. Any flat editor will do because you
are simply recording a dictation for yourself, as it were.
;new step.Write your dialog, using a small set of commands to identify
each type of content. New lines of content will concatenate to the last
type of content that you directed. In short, the system collects your
input into named buckets in the course of your dialog with it.
The "stop" command closes the collection of your dictation.
;new step.Save the file.
;new step.Process the saved file using the '''DictaDITA''' interpreter tool.
Whenever you are dictating by speech or text messaging into a
remote application, it will "listen" for the command to stop taking
dictation for the current document.
;new result.The system should inform you that a DITA task
corresponding to your dictation has been created and saved for you.

The ;command. syntax of this dialog, by the way, results from the fact that natural language interaction ordinarily requires a "listener" application that interprets your speech as you are talking to determine whether you are in a command mode or not. If you are going to write or record your dictation for multiple playback through different interpreters or to store for later, you need to capture it instead as a file-based dialog. For that case, you need to separate your commands from your content with a "start of command mode" indication (any non-alphanumeric character) and a "return to content" indication (period or end of line).

I chose the semicolon character for the start delimiter because it turned out to be highly recognizable in speech-to-text apps on my smart phone, whereas ":" usually came through as Colin. I quickly observed that nearly every non-letter character on an English soft keyboard has a multi-syllabic spoken name, which slows down being able to speak a delimiter easily. "Period," "exclamation point," "question mark," "ampersand," "asterisk," "underscore"--what's going on in the English language?!? Only "dot" seems to fit the requirement for a delimiter whose name is short like the character it represents, but speech interpreters always spell it out unless you say it as "dot com."

If you are writing the dictation instead of speaking it, you can choose to use a short form of commands. For example,

is short for the
;new paragraph.

There is not much previously published discussion available about the use of dictation for structured writing. Most software-based dictation systems assume that you are watching your conventional editing application as you interact with it, using voice commands to navigate the interface in order to insert elements or change the insertion/selection context. Of the dictation commands I was able to research, very few interact directly with parts of a document other than a paragraph--most are about interacting with the interface.

In that sense, by using interface commands you can do anything by dictation that you can do by using a mouse and keyboard. But the downside is that you are still interacting directly with the conventional editor, and you can't really do this over the phone, using a pocket recorder, using notepad on a borrowed laptop, or at an engineering workstation that doesn't have an XML editor installed by default. Even if you were able to collect that protocol as a file for replay, it would be verbose, and it would apply only for re-creating documents on that word processor (in other words, no other general reuse of the macro stream is feasible). Most importantly, you are still interacting directly with the validating (and often tag-aware) environment, which means that for creating something like a task, you still must have more than introductory-level DITA knowledge.

And how does this apply to technical writing? My quest has been to somehow enable non-tech-writer types--Subject Matter Experts--(engineers, programmers, even managers) to create the occasional input that you request of them, but 1) without requiring them to learn DITA, and 2) without your having to justify a costly editor license for them, considering how occasionally they create such content. This is why most SMEs resort to email or word processors for their writing, the formats that cause such agita for tech writers in an organization. Perhaps this is a role for dictation, using commands that can be printed on one side of a business card.

I added the support for steps to my dictation transform, and this is the result of processing the sample code into a valid, non-trivial DITA task (seen here in the XMLMind XML Editor):