Table of Contents

Overview

The initial transcription of Occom Circle documents produces a text document with a custom markup to indicate interesting aspects of the text. The elements of the document fall into several categories:

  1. Major structural elements such as pages, paragraphs, line breaks, and opening and closing sections.
  2. Typographic elements such as underlines, superscripts, large or small characters.
  3. Conceptual elements including names of people, places, organizations and dates.

Markup

Structure

The structure of a letter includes the pages of the letter, the paragraphs and line breaks, and key elements such as the opening, closing and postcript. Not all letters will include each of these elements, and the elements may be omitted from the transcription if they are not present in the letter; however, those appearing in the letter must appear in the order listed below. These elements are labeled in the document as follows:

Element Markup Description
Page == Page image_number == Must be placed at the start of each page, including the first line of the document. The image number is optional, and may be omitted if unknown. Each page in the document must be included in order in the transcription, even if the page is blank. If a page break occurs between paragraphs, a blank line must precede and follow the page marker. If the break occurs within a paragraph, the blank lines must be omitted.
Opening Date: [[date July 3, 1897]]
Salutation: Dear Mr. Smith,
The opening to a letter includes the date and salutation. These should be the first elements in the document following the first page heading. The date and salutation are placed on separate lines preceded by the words "Date:" and "Salutation:".
Body == body == The body marks the end of the opening and the start of the main body of the letter.
Closing == closing ==
Salutation: Sincerely,
Signature: John Doe
The close to a letter includes the salutation and signature.
Postscript == postscript == If any text appears after the closing of the letter, it should be included as a postscript. The contents of the postscript should be transcribed in the same manner as the main body of the letter.
Trailer == trailer == Text that appears as a closing title or footer of the letter, after any postscript.
Address == address== The address to which the letter was sent.

Text Blocks

Blocks of text (lines and paragraphs) should be typed as they appear in the document. A paragraph should be followed by one or more blank lines unless it is the last paragraph in a section.

Line Breaks line 1
line 2
Line breaks are denoted by line breaks in the text.
Paragraphs paragraph 1

paragraph 2
Paragraphs are denoted by a blank line in the text.
Indented text      indented line Lines that are indented should begin with a tab or space(s).

Typography

Typographic markup is enclosed within [[ ... ]] brackets. Immediately following the opening brackets is a key words indicating the kind of markup, a space, and the text that is marked up. The text may contain other markup, including structural markup such as lines, paragraphs or page breaks as described above. For each pair of opening brackets, there must be a corresponding pair of closing brackets.

Example: (note the use of spaces between ]] and ]] to improve legibility)

this is normal text [[bold this text will appear bold [[italic this will be bold italic text ]] ]] this is normal text

Rendered as:

this is normal text this text will appear bold this will be bold italic text this is normal text

Element Markup Description
Block Letters [[blockletters block lettered text appears here]]
abbreviated form: [[block block lettered text appears here]]
Rendered as block lettered text appears here
bold [[bold bold text appears here]] Rendered as bold text appears here
italic [[italic italic text appears here]] Rendered as italic text appears here
underline [[underline underlined text appears here]] Rendered as underlined text appears here
large letters [[large large text appears here]] normal text here Rendered as large text appears here normal text here
superscript normal text [[superscript superscript text appears here]]
abbreviated form: [[sup superscript text appears here]]
Rendered as normal text superscript text appears here

Illegible or Missing Text

Markup of illegible text depends on the reason for the illegibility: Gaps in the text, damage to the document, or merely illegible text are indicated with one of the tags listed below. If a reasonable guess about the original text can be made, this can be indicated within the tag by enclosing the guess within [ ... ] after the first ] bracket. More than one guess may be supplied if desired, by enclosing each within single brackets.

Example: (note the use of spaces between ] and [ to improve legibility)

[[illegible ] [my first guess] [my second guess]]

Element Markup Description
Gaps, missing or entirely unreadable text. [[gap reason] [my first guess] [my second guess]] The kind of gap should be indicated by the reason. It must be one of the following statements:
  • blotted_out
  • editorial_insertion
  • faded
  • illegible
  • lacuna
  • malformed
  • omitted
  • page_torn
If none of these reasons is applicable, please contact the project manager for assistance.
Blotted text [[blotted ] [my guess]] Obsolete. Use [[gap blotted_out]] instead.
Damaged text [[damaged reason] [my guess]] Obsolete. Use [[gap reason]] instead.
Illegible text [[illegible] [my guess]] Use for any illegible text that does not fall into the other categories.

Notes

Notes are informational text added by the transcriber. They may be used as markers for parts of the text that require further review, or to describe any other aspect of the transcription that the transcriber wishes to record.

Element Markup Description
Note [[note kind text of the note]
e.g. [[note editorial check the transcription of this paragraph`]]
The note kind should be one of the following words:
  • general_note
  • editorial
If none of these note kinds is applicable please contact the project manager for assistance.

Changes

Areas where the author has crossed out or added text are covered by the changes. Multiple changes can be nested if warranted. If, for example, the author indicated text was to be added to a place in the text, but crossed out part of the added text, this should be indicated by a deletion within the addition.

Element Markup Description
Added/Inserted text [[add location this is inserted text]] The author indicated that text should be inserted into the main text at this point. The location indicates where the author made the notation, and should be one of the of the following words:
  • above - above the line
  • below - below the line
  • bottom - in the bottom margin
  • end - at the end of the document
  • inline - within the body of the text
  • left - in the left margin
  • opposite - on the opposite facing page
  • overleaf - on the other side of the page
  • right - in the right margin
  • top - in the top margin
If the notation is in multiple locations, list each appropriate keyword separated by an underscore (ie, top_left). If none of these locations appears applicable, contact the project manager for assistance.
Deleted/Crossed out text [[delete this text was deleted]]
abbreviated form: [[del this text was deleted]]

Concepts

Conceptual elements are dates, places, or things which are significant and may have other information attached to them in the final website. These items should be marked so that they can be indexed and cataloged. These items should be transcribed within the given tags exactly as the author wrote them.

Element Markup Description
Dates [[date July 3, 1884]]
Person [[person Sampson Occom]]
Place [[place Hanover, NH]]
Organization [[organization Dartmouth College]]
abbreviated form: [[org Dartmouth College]]

Special Symbols

Special symbols are enclosed in pairs of less-than (<) and greater-than (>) signs.

Element Markup Description
<<mb>>
<<m bar>>
"m bar" representing an abbreviation or repeated letter m (m with Unicode u305)
<<per>> "Per" symbol (Unicode u214c)
ſ <<ls>>
<<long s>>
Long 's' (Unicode u017f)
<<sd>>
<<swung dash>>
Swung Dash (Unicode u2053)
< <<lt>> Less-than sign
> <<gt>> Greater-than sign
[ <<[>> Left square bracket
] <<]>> Right square bracket
arbitrary character <<uxxxx>> Arbitrary Unicode character with code xxxx, where xxxx is the hexadecimal code value

Journal/Ledger Transcription

Accounting journal or ledger pages which consist of a table of transactions are most easily transcribed in Excel. After transcription the files must be saved as tab delimited files before being sent to the validation script for translation. To help the translation script recognize the documents, the first row of the document must contain only the text "== document table ==" in the first cell. Since Excel treats a leading "=" as the start of a formula, this must be entered in the cell with a leading apostrophe:

'== document table ==

The second row must contain the identifier for the first page:

'== page 764565_001==

The nature of the table layout and the way Excel functions places additional constraints on the document:

Tags may not span cells

IMS markup must be entirely contained within a cell. This is actually as much an XML restriction as an IMS restriction. XML tags must be properly nested, and dates, deletions, or other tags spanning cells would not be properly nested. When markup appears to cross cells, consider one of the following alternatives:

  1. Don't apply the markup. For example, if a date spans cells, it may not be necessary to apply a date tag to the element, especially if there is a similar or identical tag nearby.
  2. Reapply the markup to the contents of each cell individually. If text is crossed out across an entire row, apply the delete markup to the contents of each cell individually.
  3. Reformat the content so it fits in one cell and markup as usual. In some instances text spans cells in the original text out of convenience. Perhaps the text was too big for a column and overlapped the next column, or perhaps the text was never intended to fit a column. If so, consider combining it into one cell and applying the markup within that cell.

This restriction and the alternatives apply to all markup that might span cells in the table. For example, if an entire row is crossed out, the "del" markup must be repeated in each cell in the row. The IMS does not support starting the "del" markup at the beginning of the row, and ending it at the end. TEI p5 does support this concept using the "delSpan" tag, but it cannot be entered directly via the IMS.

Cells may not span rows or columns

Cells may not span multiple rows or columns. This is a limitation of the export format as well as the IMS. Notes should be used as hints to the reviewers that the cell should be defined to span rows or columns, and the XML will need to be adjusted by the Text Markup Unit to add the appropriate attributes and remove the extra cells inserted by the IMS.

Cells are single lines

The text in individual cells may not contain any line breaks. This is an IMS restriction imposed by the export format. Again, notes may be used to clarify the text if desired.

Special characters must be entered using codes

Excel does not export UTF-8 encoded text. It uses a single byte for each character. Files that include non-ASCII symbols directly are incompatible with the UTF-8 character set required by the IMS system. To work around this, use the special character markup, using either named symbols or Unicode character codes. For example, the pound symbol (£) must be entered with its Unicode code value: <<u00a3>>.

Mixed documents are not possible

Each document submitted to the IMS must be either a table style or letter style document. It is not possible to mix structures. If that's necessary, the individual parts should be transcribed into separate documents, and assembled into a single XML document later, perhaps by the TMU.

Example Document

The following is a fictitious example document based on letter 764475-3 to demonstrate how a marked up document should appear.

This example is for illustration purposes only. The text does not accurately reflect the completed markup of the original letter.
== Page 764475-3_001 ==
Date: [[place Lebanon]] [[date 25[[superscript th]] Aug[[superscript t]] 1764]]
Salutation: [[person M[[superscript r.]] Occom]],




Sir.

== body ==
    Your time is so that, and your Business so 
crowding, that I can't desire such an Addition to your Bur—
den, as your coming hither again would be: I therefore take this 
Way to hint to you what I would say more fully if you were here.

    And in the first place, I suspect you will miss of 
seeing [[person Mr. Kirtland]] on his Return from [[person Mr. Whitefield]], and also 
of seeing [[person Mr. Whitefield]], who I hear preached some weeks ago 
at [[place Philadelphia]], & consequently you will miss of receiving any supplys 
which he may have got for your journey; and if so, I advise you 
to represent the Case to some able Friends at [[place New York]], and if you 
can get Supply no other Way, hire the Money of Some good Friend 
till you return.

    I herewith Send you a Copy of our Commission 
from [[place Scotland]] in order that you may shew it, if you shall have 
occasion, to [[person Gen[[superscript l]] Gage]], [[person Gen[[superscript l]] Johnson]], or others.

    I would have you obtain 15 or 20 youth, if you
can procure those [[delete that]][[add above which]] are likely, of remote Tribes of Indians.
And if you hear that which is encouraging of good [[person Peter]] at 
[[place Onohoquagee]], and those two Boys there who were offered to 
the Comissioner at [[place Boston]], let them be of the Number.

    There was also an English Lad with the Mohawks 
to learn their Tongue, before this War, who I hear is very likely: 
if you can obtain such an one, do it. I shall leave the Proportion 
of Girls to you, & [[person Gen[[superscript l]] Johnson]], whose advice I would have you 
take in every Thing, when it may be had.

    And be sure, you let all the Children whom you 
bring, know that they don't come here to be without Government, 
nor to live a lazy sordid Life, but to be fitted for Business and 
lifefulness in the World. And I am not afraid that you Should boast 
of my Mohawk Boys Proficiency in very strong Terms.

    And don't fail to write to me as your Progress, 
Success, and any Occurrence that may be entertaining, by every 
    opportunity [[note this word is found at the right margin, continuing the previous line]]
== page 764475-3_002 ==
Opportunity, as you know Friends at Home will be glad to hear.

    Send me an Acco[[superscript t]] of what Labour you have or Shall hire 
upon my Credit [[add above at Mohegan]]; and what you desire me to do for your [[add above Family]] while 
you are gone.

    And may the God of all Grace be with you & [[person David]] in 
all the way whither you go, and inspire you with Wisdom, 
Prudence, Zeal, Courage, and holy Fortitude, and honour you 
to be the Instrument to spread the Saviour of his Name, and 
the Knowledge of the great Salvation, far among the Pagans.

== closing ==
Salutation: Remember me respectfully to Friends in your Way, espe—
cially at [[place N. York]]. — which with Love &c is the needful 
from

    Yours affectionately

Signature: [[person Eleazar Wheelock]].

[[person Rev[[superscript d]] M[[superscript r]] Occom]]

== postscript == 
[[date August 27th]]  P.S. [[person Mr. Kirtland]] returned last Evening has got no money. 
[[person Mr. Whitefield]] is at [[place N. York]]. talks of going to [[place Albany]] this Week 
if he can he will serve you, if he cant acquaint [[person M[[superscript r.]] Whitaker]] — 
do the best you can — 
          

Template Document

When you start a new transcription, use the template below to get started. Delete any parts of the template that are not applicable to the letter you are transcribing. Remember to fill in the image number for pages if you know the corresponding image number.

== page ==
Date:
Salutation:
== body ==
== closing ==
Salutation:
Signature:
== postscript ==
== trailer ==
== address ==
        

Applications

The transcriptions must be saved as plain text (.txt) documents without any application specific markup. Word (.doc or .docx) or RTF (.rtf) documents are not acceptable. Documents may be edited in Word or other text editors provided they are saved as "text-only" documents. Please review the instructions below that apply to the application you are using to ensure that you are saving the documents in the correct format.

Microsoft Word (Mac - 2011)

Word is able to edit and save text documents, but the process is relatively complex compared to other editors. We recommend using Word only if no other editor is available.

Create a new document as follows:

  1. Choose File > New Document
  2. Choose File > Save As...
  3. Choose Format: Plain Text (.txt)
  4. Click Save (the File Conversion dialog box will appear)
  5. Choose Text Encoding Other Encoding and click Unicode 5.1 UTF-8
  6. Choose End Lines with LF only
  7. Make sure "Insert line breaks" and "Allow character substition" are not checked
  8. Click Ok.

Notes:

  1. You will receive a warning each time you save the document indicating that some formatting may be lost. Word displays this warning even with empty documents and even if the document contains no formatting. You must click "Save" to save the document.
  2. Word will not remember the "Save" settings, so the next time you save the document, the Text Encoding and line endings will not be correct. You must use "Save As" each time you save the document starting with Step 2 above in order to save the document properly.
  3. While editing the file, you must take care not to use any Word formatting such as bold, italic or different fonts. This information will not be saved in the text file. Use only the markup described in this document.

TextEdit (Mac)

TextEdit is an application supplied with the Macintosh operating system. It is capable of editing plain text (.txt) and RTF (.rtf) documents, and is much easier to use than Word for plain text documents. It is found in the "Applications" folder.

To create a new plain text (.txt) document in TextEdit:

  1. Choose File > New
  2. Choose Format > Make Plain Text (see notes)
  3. Choose File > Save to save the document.
  4. Choose Plain Text Encoding Unicode (UTF-8) (this should be the default, and will only be necessary the first time you save the file)

Notes:

  1. Using the TextEdit preferences you can choose whether the New command creates a plain text or RTF document by default. If a document is an RTF document, TextEdit displays a ruler at the top of the page. If you do not see the ruler and cannot find the "Make Plain Text" command, you already have a plain text document. While you are working on the Occom Circle project, it is recommended that you set this preference so that TextEdit automatically creates new plaintext documents.
  2. The font and type size for plaintext documents is changed in the TextEdit preferences. You may set this preference to any font and size you find convenient.

NotePad++ (Windows)

Notepad++ (free)
http://notepad-plus-plus.org/
  1. Go to notepad-plus.org/news/notepad-6.1.8-release.html to download.
  2. When you first open notepad++ you will get a new file as a default (its name is usually “new 1”) .
  3. Go to the “Encoding” in the top menu and change the setting to “Encode in UTF-8”.
  4. When you save the file go to the top menu “File” and then “Save As…” and name the file.
Notepad (comes with Windows)
All Programs --> Accessories --> Notepad
Notepad defaults to ANSI text but you can choose unicode or UTF-8 when doing a "Save As"

Word (Windows)

Word 2010 on Windows 7
Choose the No Spacing style
Save As "Plain Text (*.txt)"
You then get a dialogue box with warnings about losing formatting where you can choose 
to insert line breaks different then the default of CR|LF. Choices are

CR|LF
CR only
LF only
LF|CR

Choose LF only.

Other Applications

Other applications are available for the Macintosh and PC that are capable of editing plain text files. If you wish to use something other than one of the applications listed above, please consult with the project manager before beginning to ensure that your application is compatible with the system.

Validation and Submission

After the document transcription is complete, the document must be validated, translated to XML and sent to the project manager. This is accomplished through the Occom Circle Validation Form.

If a document is valid, the form displays a color coded version of the document to facilitate proof reading the markup, and a list of key elements (people, places, organizations) found in the document. After the document is proofread, a link on the page emails the original transcription and translated XML document to the project manager.

If markup errors are found in the document, a list of problems is displayed with the line numbers and text surrounding each error. These must be corrected before the document can be submitted to the manager.

Post-Translation Tasks

The translation to TEI markup leaves placeholders for certain elements that must be filled in manually: