Hi Bob. Thanks for the comments. The JSON file would be useful for several things.
First, if you wanted to match the audio to the lip movements in an animation, then knowing the start and stop times and the graphemes involved is needed.
Second, if you wanted to play the audio and show the text as it was being read either by printing it or by highlighting the text.
Another one is if you want to check the confidence level of a word (say less than 95%) and give the user the chance to correct the transcription by playing a portion of the audio before and after.
If you just wanted the transcription and you are going to do something else with it, like give it back to the user to proof read (i.e. medical transcription) where the user can just make the corrections, then it’s covered. Incidentally, Transcribe and Comprehend work together to provide medical transcription.
Hope that answers your question!