Some contemplating on frame storage formats and clashes therein
I recently posted a little bit on how I now store contributed videotex (teletext and viewdata) frames within a database, so as to make accessing them far easier on the application side.
To do this, I had to decide on exactly how to store the visible content of the frame. Everything else is easy; I crated a secondary table holding key=>value pairs, which means it is very easily expandable, and any application needing particular data can go look for it's own, and not be confused by anything extra.
So. The frame content itself. I didn't get much help looking at existing storage formats, as I've got at least 17 types documented, and others I know about. I may however have been influenced somewhat by them.
When you think about a viewdata frame, or a teletext page, you automatically see the 23-25 lines by 40 columns of static image. Almost every frame you will find that has been saved out by a terminal emulator, or teletext captures, will consist of those 920, 960 or 1000 bytes of data, perhaps with some meta-data accompanying, sometimes not. I think that every third-party viewdata host that I have so far encountered also stored its pages so. Individual characters took up a single bytes as per their ASCII character code, and colour and control characters were also stored as a single byte. For teletext, this uses the non-display codes below the space, as there is no concept of cursor movements, carriage returns, etc, on a teletext screen, which is what these values are used for in a serial-terminal based service.
Prestel, and viewdata generally, is however serial. Frames are sent to the user as ASCII characters, but the colour and control codes are sent as command sequences: Escape then a capital letter. So, what might be stored in a teletext page as "<01>RED<02>GREEN<07>WHITE" would be sent to a viewdata terminal as "<ESC>ARED<ESC>BGREEN<ESC>GWHITE". Short lines would be terminated by a carriage return and linefeed, so reducing the need to send the whole 40 characters.
Now.. Prestel itself is known to have stored the frame data exactly as it would be sent to the user. There was a hard limit of 920 bytes available to the editor to use, and colour codes, etc, took up two of them. This made creating complicated graphical pages somewhat difficult, as too many colour changes could quickly eat up all the allocation. (Response frames were even worse; you only got 716 bytes to play with!) This is probably why all third party viewdata servers stored their page as the 22x40 character full image, with the control codes stored as per teletext. Doing this allowed for much more colour and graphic rich content than was possible on Prestel itself - the conversion was done on transmission. The actual codes stored varied - some systems used 7 bit data throughout, some used top-bit sett letters to indicate that letter needed the escape sending before it, some used 7 bits for visible characters, and top-bit set control codes (codes in the range 128-159) and at least one had everything with the top bit set!
So fast forward 30 years, and I'm writing code to handle saved viewdata pages and display them on this new-fangled World Wide Web thing. There is zero support for viewdata and teletext format images, so we have to roll our own, converting saved pages in any number of formats into PNG or GIF (to account for flashing characters) images that a web browser can display.
As an intermediate stage, I have to pull that 22-24x40 matrix of characters out, before plotting them onto a graphics image for sending to the viewer. This intermediate block of characters I called an "internal" format, and was 7-bit clean, so codes below space for the colour codes, and the rest visible.
For nearly ten years this worked fine, and this internal, intermediate format, was the format used when I created the page database.
It is only this week I hit a problem with this, and it is down to a peculiarity with how Prestel stores Response Frames. (And, I assume, other frames that are not simple static pages.)
A response frame contains a number of fields that are defined by the editor when they create it, and are either filled automatically by the Prestel server when it displays the page, or can contain text or data to be entered by the user. When the user hits # on the last field, they are given the option to send (or not) the page to the IP. It is then delivered to their mailbox in a filled-in state.
When defining a response frame in the standard Prestel online editor, a field is specified by typing, e,g. Crtl-L n 30 Ctrl-L will create a field of 30 characters length containing the subscribers' name - on pressing the second Ctrl-L the system will display 30 "n"s in the required position. The same procedure is repeated for any other field you request. What gets stored in the Prestel database is a single Ctrl-L and 30 "n"s.
When you retrieve a page from Prestel using the "Bulk" Online editor, it is sent exactly as stored, so you get the Ctrl-L and sequence of letters alongside the Escape'd colour codes and CR/LFs for short lines. Uploading a replacement frame you specify the layout in the same manner.
Those of you familiar with the standard ASCII control codes will recognise that Ctrl-L is also known as "Clear Screen", and is a character that is usually sent before sending the frame content. This is probably why it was used for this purpose - finding it in the middle of the frame content would not make sense, so it was re-purposed as a flag for start-of-field. Obviously this is never actually sent to the user, but is replaced by a space when viewing on a terminal.
Now ...
I have two small databases in my posession that were pulled back down from Prestel at some point, and these include a number of Response Frames.
When I converted the data to my "internal" format to load them into the database, this normalised the control codes to 7-bit data, filling that lower 32 bytes of the table. On displaying, these codes were sent as <Esc><code + 64>, this recreating the colour sequences.
When it comes to a <ctrl l>, however, this was never stored in the database - the normalisation routine ignored it. However, even if it had been saved, on recall, it would have been translated into an <Esc>L, the sequence to end double-height text.
So, to summarise, the normalisation I did, in most cases, lost the start-of-field character because it wasn't expected in a frame. And if it did make it though, it would be indistinguishable from the "Single Height" code, and as that was allowed anywhere in a response frame, it couldn't be deduced from context.
I never noticed, because there were so few frames affected, and there was no need to process the fields the code indicated, anyway!
This last month, however, I've been working on a viewdata host program that will run on a modern server, and which I could use to receate the look and feel of using the original Prestel service. I've been testing this using an actual Prestel terminal, and it's been great fun! It's only when I stumbled across one of these response frames, and decided to support them, that I discovered this problem!
Looking into how other file formats solved this, it seems that at least one of them uses <Esc> itself as the field indicator. If stored in the database like that, when expanded on recall this would translate into an unused code sequence, in viewdata, so is a suitable alternative. I will translate the affected pages, eventually!
So, a decision taken about 10 years ago came back to bite me this week. And it's all to do with 25 year old data in a file format determined 40 years ago that everyone else decided needed to be done differently.
Well done for making it this far!
As an aside .. Prestel added support for "Dynamic frames" which were basically frames that could contain cursor movement characters. This meant you could go back and change things after you had already drawn them. This was easy for them, as they stored data in an as-transmitted form anyway. It's no so easy for host software that expects it's frames to be stored in a fixed matrix! I'll be working on this, one I find some original examples....
Labels: coding, prestel, projects, retrochallenge, teletext, viewdataviewer