Dear Friends,
I have finished my conversion work on the Studer list archive, and have just uploaded the files to the "Files" section of the Yahoo Studer group, where I am cross-posting this:
http://tech.groups.yahoo.com/group/STUDER/As many of you already know, the Studer list is a remarkable source of information, but has been more or less impossible to use in the unformatted form in which it has been downloaded.
Converting these posts into a readable form has turned out to be much more time-consuming and difficult than I could ever have imagined for several reasons, among them the fact that the structure of the Majordomo format changed over time, and that individual email messages themselves contain multiple levels of quotations of previous messages, and thus may have record delimiters inside them. It has taken a couple of weeks of work with GREP search-and-replace and "regular expressions" in TextWrangler to delicately tease out the individual posts, especially as there are all sorts of other anomalous records that don't share the header sequence and structure of the messages around them.
Happily, I had the help of some very capable friends, and was able to shape the archive into a uniform state as a massive text file, from whence I imported everything into FileMaker Pro. Even though I've used FileMaker for years, I still needed a Filemaker programmer to write a short script that would parse out the header from the content of each post. To improve the signal-to-noise ratio, I have decided to keep just four lines of the original header with each message: "Date", "From", "To" and "Subject". Once inside FileMaker, the original email messages became individual records, and the project became much less unwieldy. Since quotations are often repeated and bounced back and forth in numerous messages, I thought that it would be useful if in the future we could refer unambiguously to a given message, and so I serialized the records in Filemaker, from 1 to 12,214. I also inserted a record at the end of each of the seven original blocks of downloaded text, and these can be quickly found by searching for my name as a string, "Christopher Campbell", or "end of text file?. A Filemaker search showed about 100 records that didn't have complete header fields, and since "Subject" seems especially important, I reconstructed subjects for all of these, sometimes by searching back for quoted text in order to identify the thread title of which the post was evidently a part.
In doing this work, I was keenly aware that the Studer list is not only dense with precise technical information, it is also an historical record, and so I was exceedingly careful not to make any intentional changes to the content. Unfortunately, there are about 400 "rogue" records which are more or less randomly distributed through the archive, and which lost either their original line breaks, or their quote level formatting, or both, in the process of being extracted from the Majordomo system, i.e. when they were originally archived as text blocks. Quite a number of these records came through with no line breaks at all, and required that I insert them manually. To signal the that I had to intervene in these records to an unusual extent, I inserted a line at the beginning of each such record, "[Note: original line breaks and quote level formatting may have been lost]". Please recall that if there is ever a question about some aspect of a record having been inadvertently changed, you can always go back to the original archive and compare the raw and formatted versions. By the way, if any reader discovers any such problem, please let me know and I will make the necessary corrections (note: the easiest way to contact me is through the "Contact" tab on my web site). TextWrangler reports that the Studer list, all told, contains 25,852,742 characters, 3,810,550 words, and 654,350 lines, so it would be a minor miracle if every message escaped all the necessary processing unscathed!
It has been harder than I might have thought to settle upon a final format for this archive, as we may each wish to work with it differently. For this reason, I have processed it into a variety of forms:
1_Studer-list_complete.txt (two files, _01 to _02)
This is the complete text of all seven sections, concatenated with record separators, after all the regex processing was completed, but before being imported into Filemaker, so it contains all the header information, but no serial numbers. The format is plain text, and it should be searchable using any text editor. Yahoo maximum file size limits required that it be divided into two files.
2. Studer-list_messages.txt
This version, with record serial numbers, was generated from the Filemaker Pro database, and contains the complete message content, but from the original header retains only date, from, to and subject. The format is plain text.
3. Studer_list_PDF (eight files, from _01 to _07)
To make the files accessible in a platform-agnostic mode (I'm a die-hard Apple guy), and to keep individual file sizes manageable, I converted the finished text files into rich text format (formatted as Verdana 11), and then saved them as PDF documents. I don't think any of us will be printing these out in their entirety, however, as that would require more than 9,000 pages. Yahoo maximum file size limits required that section 6 be divided into two files, parts A and B.
4_Studer_list_Filemaker.fp7
The easiest and most efficient way to work with the archive is surely with a full-fledged database such as FileMaker Pro. In FileMaker one can search for and create subsets of records, and I have created additional fields where one can flag messages that have been read, and store keywords and personal notes. There are layouts for record detail, a list view, and one showing all fields, so one sees every line of the header. As the zipped Filemaker Pro document is 28 MB, I am hosting it ? at least for now ? under the following link:
http://www.cbcampbell.com/external_content/Studer_list_Filemaker/4_Studer_list_Filemaker.fp7.zipEven though my primary work is as a visual artist, for the last twenty years I have overseen a small, private archive of spoken voice work from the 1950s-60s, and have recently been performing domain transfers on hundreds of open-reel recordings with two Revox decks, a 2-track C270 (purchased new in 1991),and a 4-track C274 (purchased used and refurbished just recently). It is more than a little ironic, given the final scope of this Studer project, that I don't even own a Studer deck (yet). On the other hand, I am quite certain that gradually reading my way through the archive will be the best possible preparation for understanding the implications of choosing various models, and using and maintaining them into the future. So, let the fun begin!