Table of Contents

Overview

The Initial Markup Scheme (IMS) for TEI text projects provides a means for rapid transcription of the majority of markup in simple text documents. The IMS is easier to read and type than pure XML and is readily converted into TEI XML. After conversion the resulting XML document can be proofread and enhanced as required by an individual project to produce a final XML document of any desired complexity.

The elements of a document that can be represented by the IMS fall into several categories:

  1. Major structural elements such as pages, chapters, subchapters and other major divisions.
  2. Text level structural elements including paragraphs and line breaks.
  3. Typographic elements such as underlines, superscripts, large or small characters.
  4. Conceptual elements including names of people, places, organizations and dates.

Markup

Structure

The structure of a document includes the chapters and subsections, and page breaks. Not all documents will include all of these elements. These structural elements are indicated as follows:

Element Markup Description
Document Header == Document document_type == Different kinds of documents have different structures. The document header indicates to the validation system what kind of document it should expect. See the Document Structure section for details on the possible values of "document_type". If no Document Header is found, the document is assumed to have a type value of "prose".
Page == Page image_number == Must be placed at the start of each page, including the first line of the document. The image number is optional, and may be omitted if unknown. Each page in the document must be included in order in the transcription, even if the page is blank. If a page break occurs between paragraphs, a blank line must precede and follow the page marker. If the break occurs within a paragraph, the blank lines must be omitted. The document number is assumed to be all of the characters in the image_number preceding the first "_".
Section == Section section_type == A section describes a major structural division of a document. Each section has a section_type which identifies the name of the section and implicitly indicates how the section relates to other sections in the document. Documents may have any number of sections, but sections follow a rigid structure which permits only certain kinds of sections at any given point in the document. The "document_type" specified in the Document Header determines which sections may appear in the document.

Text Blocks

Blocks of text (lines and paragraphs) should be typed as they appear in the document. A paragraph should be followed by one or more blank lines unless it is the last paragraph in a section.

Line Breaks line 1
line 2
Line breaks are denoted by line breaks in the text.
Paragraphs paragraph 1

paragraph 2
Paragraphs are denoted by a blank line in the text.
Indented text      indented line Lines that are indented should begin with a tab or space(s).

Typography

Typographic markup is enclosed within [[ ... ]] brackets. Immediately following the opening brackets is a key words indicating the kind of markup, a space, and the text that is marked up. The text may contain other markup, including structural markup such as lines, paragraphs or page breaks as described above. For each pair of opening brackets, there must be a corresponding pair of closing brackets.

Example: (note the use of spaces between ]] and ]] to improve legibility)

this is normal text [[bold this text will appear bold [[italic this will be bold italic text ]] ]] this is normal text

Rendered as:

this is normal text this text will appear bold this will be bold italic text this is normal text

Element Markup Description
Block Letters [[blockletters block lettered text appears here]]
abbreviated form: [[block block lettered text appears here]]
Rendered as block lettered text appears here
bold [[bold bold text appears here]] Rendered as bold text appears here
italic [[italic italic text appears here]] Rendered as italic text appears here
underline [[underline underlined text appears here]] Rendered as underlined text appears here
large letters [[large large text appears here]] normal text here Rendered as large text appears here normal text here
superscript normal text [[superscript superscript text appears here]]
abbreviated form: [[sup superscript text appears here]]
Rendered as normal text superscript text appears here

Illegible Text

Markup of illegible text depends on the reason for the illegibility: Gaps in the text, damage to the document, or merely illegible text are indicated with one of the tags listed below. If a reasonable guess about the original text can be made, this can be indicated within the tag by enclosing the guess within [ ... ] after the first ] bracket. More than one guess may be supplied if desired, by enclosing each within single brackets.

Example: (note the use of spaces between ] and [ to improve legibility)

[[illegible ] [my first guess] [my second guess]]

Element Markup Description
Gaps, missing or entirely unreadable text. [[gap reason] [my first guess] [my second guess]] The kind of gap should be indicated by the reason. It must be one of the following statements:
  • blotted_out
  • editorial_insertion
  • faded
  • illegible
  • lacuna
  • malformed
  • omitted
  • page_torn
If none of these reasons is applicable, please contact the project manager for assistance.
Illegible text [[illegible] [my guess]] Use for any illegible text that does not fall into the first two categories.

Notes

Notes are informational text added by the transcriber. They may be used as markers for parts of the text that require further review, or to describe any other aspect of the transcription that the transcriber wishes to record.

Element Markup Description
Note [[note kind text of the note]
e.g. [[note editorial check the transcription of this paragraph`]]
The note kind should be one of the following words:
  • general_note
  • editorial
If none of these note kinds is applicable please contact the project manager for assistance.

Changes

Areas where the author has crossed out or added text are covered by the changes. Multiple changes can be nested if warranted. If, for example, the author indicated text was to be added to a place in the text, but crossed out part of the added text, this should be indicated by a deletion within the addition.

Element Markup Description
Added/Inserted text [[add location this is inserted text]] The author indicated that text should be inserted into the main text at this point. The location indicates where the author made the notation, and should be one of the of the following words:
  • above - above the line
  • below - below the line
  • bottom - in the bottom margin
  • end - at the end of the document
  • inline - within the body of the text
  • left - in the left margin
  • opposite - on the opposite facing page
  • overleaf - on the other side of the page
  • right - in the right margin
  • top - in the top margin
If the notation is in multiple locations, list each appropriate keyword separated by an underscore (ie, top_left). If none of these locations appears applicable, contact the project manager for assistance.
Deleted/Crossed out text [[delete this text was deleted]]
abbreviated form: [[del this text was deleted]]

Concepts

Conceptual elements are dates, places, or things which are significant and may have other information attached to them in the final website. These items should be marked so that they can be indexed and cataloged. These items should be transcribed within the given tags exactly as the author wrote them.

Element Markup Description
Dates [[date July 3, 1884]]
Person [[person Samuel Occom]]
Place [[place Hanover, NH]]
Organization [[organization Dartmouth College]]
abbreviated form: [[org Dartmouth College]]

Special Symbols

Special symbols are enclosed in pairs of less-than (<) and greater-than (>) signs.

Element Markup Description
<<mb>>
<<m bar>>
"m bar" representing an abbreviation or repeated letter m (m with Unicode u305)
<<per>> "Per" symbol (Unicode u214c)
ſ <<ls>>
<<long s>>
Long 's' (Unicode u017f)
<<sd>>
<<swung dash>>
Swung Dash (Unicode u2053)
< <<lt>> Less-than sign
> <<gt>> Greater-than sign
[ <<[>> Left square bracket
] <<]>> Right square bracket
arbitrary character <<uxxxx>> Arbitrary Unicode character with code xxxx, where xxxx is the hexadecimal code value

If an arbitrary character is used many times in a collection, please contact DLTG to request a simplified alias for the character (like "long s", etc.) to make the transcription easier to read. This will have no effect on the XML produced, but will make proofreading of the IMS easier.

Document Structure

Here are the various acceptable document_types:

Prose (default)

A simple prose document such as a book or essay. A prose document may have any of the following top sections: front, body, back . Each section may contain up to 9 levels of divisions, div1...div9, and the contents of each section must be contained within at least div1 sections. Div1 typically corresponds to a chapter.

Letter

A formal letter. A letter may have the following sections:

SectionContent
body This section is required and must appear immediately following the document header. The body section must be immediately followed by either a div1 or an address section. Exactly 1 body section is expected in a document.
div1 If present, this section must appear immediately after the body section and it must appear before and be immediately followed by an opener, content, closer, postscript or trailer section. No more than 1 div1 section is expected in a document.
opener Keyword content:
date: the letter's date line
salutation: the opening salutation

If present it must appear after a div1 section.

content The content section starts the main body of the letter. It must appear after a div1 section and may contain div2 ... div9 sections.
closer Keyword content:
salutation: the closing salutation
signature: the writer's signature

If present it must appear after a div1 section.

postscript Text content consisting of the letter's postscript, with optional div2 - div9 sections. If present it must appear after a div1 section
trailer Text content of any trailing material, with optional div2 - div9 sections. If present it must appear after a div1 section.
address Text content representing the recipients address. This section must appear after a body section. A div1 section is required following the address if any additional content follows.

See the example documents and template documents for additional guidance on the accepted structure of a letter.

[ed: More structures coming.....]

Example Documents

These examples are for illustration purposes only. The text does not necessarily accurately reflect the proper markup of the original documents.

Example Prose Document

== Document prose ==
== section body ==
== section div1 ==
== Page ms794-001-001 ==
[[add top 250 words to a page]]
	The Chinese Ancestral Rites.
		By Charles D. Tenney,
			Formerly Counselor of the U. S. Legation, Peking, China.

In the year 1700 the Chinese Emperor, Kang Hsi, issued his famous edict
explaining the use of Heaven by the Chinese as meaning God and pronouncing
the rites performed in honor of Confucius and the ancestors to be civil
rites and not worship. The Jesuit missionaries became the advocates of this
view and proposed to allow the Christian converts to continue to perform the
ceremonies. The Catholic missionaries of other orders however, notably the
Dominicans, dissented from the conclusion of the Jesuits, and referred the whole
 matter to the Pope for his decision. The Pope decided against the Jesuits.
This angered the Emperor and led to the first persecution of the Catholic
priests. All missionaries remained proscribed by Chinese law until the
Treaties of 1858 gave them freedom to propagate the Ch[[add inline r]]istian religion.
When the Protestant missionaries arrived in China , they accepted the ruling
of the Pope in regard to the use of Heaven for God and also in regard to the
Confucian and ancestral rites. As they wished to distinguish themselves
before the Chinese from the Catholics[[add inline ,]] the Protestant Church became known
as the [[add inline "]]sect of Jesus[[add inline "]], the Catholic Church being known as the [[add inline "]]sect of the
Lord of Heaven[[add inline "]]. The[[add inline se]] are rather misnomers, of course, because the Protestants
often use the term [[add inline "]]Lord of Heaven[[add inline "]], while the Catholics by no means leave
out [[add inline "]]Jesus[[add inline "]] from their services.

	I have never felt satiafied with the decision of the early Protestant
missionaries to accept the decision of the Pope regarding the Confucian
and ancestral rites. Owing to the fact that these rites have been in use
== Page ms794-001-002 ==
(2)
among the Chinese for many generations, it has become a matter of duty
to continue the ceremonies. So the first act required of a Christian convert
is to do something condemned by his conscience.	I remember well a
conversation which I once had with an intelligent and serious-minded
young Chinese. This young man said to me in my native place a rumor
is current that the first act required of a Christian convert is to split
up the ancestral tablets. I wish you to understand that I do not believe  
this rumor , but I [[add inline would like to]] refer to you as my authority for contradicting it."
I was put in a very embarrassing position by this appeal. I could only say
that Christian converts did not continue to practice the ancestral rites,
though I had never heard of their being required to split up the tablets.
I could see that the young Chinese was convinced by my reply that the rumor
to which he had referred substantially correct. It would have been
very easy to avoid the suspicion of unfilial conduct on the part of the
Christian convert by drawing up a Christian ceremony to take the place of
the old one and so allowed the continuity of the old rite to remain unbroken.
They were wiser in the early days of Christianity about interfering with the
habits of the pagans. The early church fathers simply put a new meaning
into the old ceremonies and allowed them to continue. Thus the spring
ceremony in honor of the goddess Eastre was continued and even made more
elaborate, though a new meaning was put into it. This made the change of
religion easier for the first generation of converts, and after the lapse
of a few generations the old meaning was lost and forgotten. So the
Saturnalia of the Romans and the winter festival of the Britons were
quietly changed to a celebration of the birth of Christ. Even the formerly
sacred mistletoe and holly were not forbidden to be used as decoration.
== Page ms794-001-003 ==
(3)
When an old rite is taken over with a new application or meaning, it soon 
loses its old objectionabl features in favor of the new meaning.

	I once had an experience which almost duplicated the early troubles of
the Catholic missionaries with the Emperor Kang Hsi. I was serving at the time
as president of the government University at Tientsin. The monthly rites
in honor of Confucius were observed at the University as in all the Government
schools of China, but because some of the students came from Christian
families and I knew had been taught that the rites were sinful[[add inline ,]]I made the rule
that there should be no roll call or marking of attendance, so that all who
had conscientious scruples regarding the ceremony might absent themselves.
When Yüan Shih-kai was governor of Shantung in 1901 he had begun to organize
a provincial college and had invited one of the American missionaries to act
as president. After the death of Li Hung-chang, Yüan had been promoted to [[add inline be]]
Viceroy of the metropolitan province and Chou Fu, a distinguished Confucian
scholar had been appointed to succeed him in Shantung. He came to me before
leaving for his new post and explained that he was troubled because the
missionary at the head of the new college had refused to allow the usual
ceremony in Honor of Confucius to be performed in the college, His Excellency
asked how I got over the difficulty , and I explained my arrangement by
which Christian students were allowed to absent themselves from attendance.
The new governor said that this arrangement would be quite satisfactory to
him, provided that the usual rite prescribed for all government schools,
should be continued for the benefit of the Confucianist students. He asked
that I should correspond with with the missionary, suggesting to him my
method of procedure. I did so, but was somewhat surprised to receive the
reply that my method would be unsatisfactory to him because he considered it
to be "dishonest. I communicated the reply to His Excellecy Chou Fu,who
== Page ms794-001-004 ==
(4)
then said that he regretted losing the services of the foreigner in the
important fork of organizing the new college, but that he regarded it as essen-
tial that the usual rites should be observed. He proposed another method of
meeting the scruples of the missionary. He promised that on his arrival at
his new post, he would issue a proclamation explaining the ceremony as an
act of honor to Confucius as the founder of Chinese literature but in no
sense an act of worship. That is, he would repeat the response that the
Emperor K'ang Hsi had made two hundred years before to the petition of the
Jesuits. I felt that this would be useless, but consented to send the message
to the missionary concerned. In due course his reply came to the effect that
he could not allow the continuance of the ceremony because he regarded its
"tendancy to be idolatrous, however it might be explained. The result was
that the well equipped foreigner was obliged to resign and turn over his
important work to less competent Chinese hands. I was much struck by a
remark of Governor Chou Fu in the couse of our conferences on this subject.
He said"It is absurd to say that we worship Confucius. We Confucianists do not
believe in the immortality of the soul. [[delete N]] [[add inline C]]onfucius no longer exists. How can
you worship what does not exist? I considered it very unfortunate that the
control of the new university should be lost to the competent hands of the
foreign missionary through what seemed to me the unreasonable position of
the Proestant Missionary body. the ancestral rites are so connected in
China with the principle of honor to parents and ancestors that the discontin-
uance of the ceremony brands the new faith as unfilial. To the Chinese the
new religion suffers the same handicap that any new faith would suffer among
us if the first act required of a convert were to go the cemetery and spit
upon the graves of his parents. The fine feeling of respect for parents is
one of the best features of the old Chinese civilization and in the process
== Page ms794-001-005 ==
(5)
of the modernizing of China that is now going on[[add inline ,]] that honor to parents ,
the value of which we recognize in the Fifth Commandment ,is fast disappearing.
Instead of requiring Christian converts to pay no more attention to the
ancestral tablets, I should much prefer that the church draw up a new ritual
for Christian converts in which the thanks of the descendant should be
expressed to God for the gift of life through the medium of the ancestors.
The new ritual should include thanksgiving to God for the example and discip-
line of the parents, and the old form of worship should be allowed to continue
with these modifications. Thus the rite might be made even more elaborate
for those within the church than for those outside. Protestans generally
object to the laying out of food as an accom[[add inline pan]]iment of worship or as a sign of
respect,fo[[add inline r]]getting the ritual of worship that is ordered in the Old Testament
according to which food is offered in the worship of Jehovah. We decorate the
graves of the departed with flowers. The Chinese would arrange small dishes
of food about the graves to express the same feeling.

	In general, my experience in China has caused me to feel that the
Protestant missionaries have been rather illiberal in not allowing the Chinese
to express their feeling[[add inline s]] in their own way. They have generally tried to force
upon them the Puritan forms of worship and the customs of other lands rather
than to follow the lines of least resistance as did the old church fathers
in their dealings with the pagans.
        

Example Letter

== document letter ==
== section body ==
== section div1 ==
== section opener ==
== Page 764475-3_001 ==
Date: [[place Lebanon]] [[date 25[[superscript th]] Aug[[superscript t]] 1764]]
Salutation: [[person M[[superscript r.]] Occom]],




Sir.

== section content ==
    Your time is so that, and your Business so 
crowding, that I can't desire such an Addition to your Bur—
den, as your coming hither again would be: I therefore take this 
Way to hint to you what I would say more fully if you were here.

    And in the first place, I suspect you will miss of 
seeing [[person Mr. Kirtland]] on his Return from [[person Mr. Whitefield]], and also 
of seeing [[person Mr. Whitefield]], who I hear preached some weeks ago 
at [[place Philadelphia]], & consequently you will miss of receiving any supplys 
which he may have got for your journey; and if so, I advise you 
to represent the Case to some able Friends at [[place New York]], and if you 
can get Supply no other Way, hire the Money of Some good Friend 
till you return.

    I herewith Send you a Copy of our Commission 
from [[place Scotland]] in order that you may shew it, if you shall have 
occasion, to [[person Gen[[superscript l]] Gage]], [[person Gen[[superscript l]] Johnson]], or others.

    I would have you obtain 15 or 20 youth, if you
can procure those [[delete that]][[add above which]] are likely, of remote Tribes of Indians.
And if you hear that which is encouraging of good [[person Peter]] at 
[[place Onohoquagee]], and those two Boys there who were offered to 
the Comissioner at [[place Boston]], let them be of the Number.

    There was also an English Lad with the Mohawks 
to learn their Tongue, before this War, who I hear is very likely: 
if you can obtain such an one, do it. I shall leave the Proportion 
of Girls to you, & [[person Gen[[superscript l]] Johnson]], whose advice I would have you 
take in every Thing, when it may be had.

    And be sure, you let all the Children whom you 
bring, know that they don't come here to be without Government, 
nor to live a lazy sordid Life, but to be fitted for Business and 
lifefulness in the World. And I am not afraid that you Should boast 
of my Mohawk Boys Proficiency in very strong Terms.

    And don't fail to write to me as your Progress, 
Success, and any Occurrence that may be entertaining, by every 
    opportunity [[note editorial this word is found at the right margin, continuing the previous line]]
== page 764475-3_002 ==
Opportunity, as you know Friends at Home will be glad to hear.

    Send me an Acco[[superscript t]] of what Labour you have or Shall hire 
upon my Credit [[add above at Mohegan]]; and what you desire me to do for your [[add above Family]] while 
you are gone.

    And may the God of all Grace be with you & [[person David]] in 
all the way whither you go, and inspire you with Wisdom, 
Prudence, Zeal, Courage, and holy Fortitude, and honour you 
to be the Instrument to spread the Saviour of his Name, and 
the Knowledge of the great Salvation, far among the Pagans.

== section closer ==
Salutation: Remember me respectfully to Friends in your Way, espe—
cially at [[place N. York]]. — which with Love &c is the needful 
from

    Yours affectionately

Signature: [[person Eleazar Wheelock]].

[[person Rev[[superscript d]] M[[superscript r]] Occom]]

== section postscript == 
[[date August 27th]]  P.S. [[person Mr. Kirtland]] returned last Evening has got no money. 
[[person Mr. Whitefield]] is at [[place N. York]]. talks of going to [[place Albany]] this Week 
if he can he will serve you, if he cant acquaint [[person M[[superscript r.]] Whitaker]] — 
do the best you can — 
          

Template Document

When you start a new transcription, chose one of the templates below to get started. Delete any parts of the template that are not applicable to the document you are transcribing. Remember to fill in the image number for pages if you know the corresponding image number.

Prose

== document prose ==
== section body ==
== section div1 ==
== page ==
(replace this line with the content of the document)
        

Letter

== document letter ==
== section body ==
== section div1 ==
== section opener ==
== page ==
Date:
Salutation:
== section content ==
== section closer ==
Salutation:
Signature:
== section postscript ==
== section trailer ==
== section address ==
        

Applications

The transcriptions must be saved as plain text (.txt) documents without any application specific markup. Word (.doc or .docx) or RTF (.rtf) documents are not acceptable. Documents may be edited in Word or other text editors provided they are saved as "text-only" documents. Please review the instructions below that apply to the application you are using to ensure that you are saving the documents in the correct format.

Microsoft Word (Mac - 2011)

Word is able to edit and save text documents, but the process is relatively complex compared to other editors. We recommend using Word only if no other editor is available.

Create a new document as follows:

  1. Choose File > New Document
  2. Choose File > Save As...
  3. Choose Format: Plain Text (.txt)
  4. Click Save (the File Conversion dialog box will appear)
  5. Choose Text Encoding Other Encoding and click Unicode 5.1 UTF-8
  6. Choose End Lines with LF only
  7. Make sure "Insert line breaks" and "Allow character substition" are not checked
  8. Click Ok.

Notes:

  1. You will receive a warning each time you save the document indicating that some formatting may be lost. Word displays this warning even with empty documents and even if the document contains no formatting. You must click "Save" to save the document.
  2. Word will not remember the "Save" settings, so the next time you save the document, the Text Encoding and line endings will not be correct. You must use "Save As" each time you save the document starting with Step 2 above in order to save the document properly.
  3. While editing the file, you must take care not to use any Word formatting such as bold, italic or different fonts. This information will not be saved in the text file. Use only the markup described in this document.

TextEdit (Mac)

TextEdit is an application supplied with the Macintosh operating system. It is capable of editing plain text (.txt) and RTF (.rtf) documents, and is much easier to use than Word for plain text documents. It is found in the "Applications" folder.

To create a new plain text (.txt) document in TextEdit:

  1. Choose File > New
  2. Choose Format > Make Plain Text (see notes)
  3. Choose File > Save to save the document.
  4. Choose Plain Text Encoding Unicode (UTF-8) (this should be the default, and will only be necessary the first time you save the file)

Notes:

  1. Using the TextEdit preferences you can choose whether the New command creates a plain text or RTF document by default. If a document is an RTF document, TextEdit displays a ruler at the top of the page. If you do not see the ruler and cannot find the "Make Plain Text" command, you already have a plain text document. While you are working on the Occom Circle project, it is recommended that you set this preference so that TextEdit automatically creates new plaintext documents.
  2. The font and type size for plaintext documents is changed in the TextEdit preferences. You may set this preference to any font and size you find convenient.

NotePad (Windows)

Notepad++ (free)
http://notepad-plus-plus.org/
http://notepad-plus-plus.org/release/5.9
http://download.tuxfamily.org/notepadplus/5.9/npp.5.9.Installer.exe

Notepad (comes with Windows)
All Programs --> Accessories --> Notepad
Notepad defaults to ANSI text but you can choose unicode or UTF-8 when doing a "Save As"

Word (Windows)

Word 2010 on Windows 7
Choose the No Spacing style
Save As "Plain Text (*.txt)"
You then get a dialogue box with warnings about losing formatting where you 
can choose to insert line breaks different then the default of CR|LF. Choices are

CR|LF
CR only
LF only
LF|CR

Choose LF only.

Other Applications

Other applications are available for the Macintosh and PC that are capable of editing plain text files. If you wish to use something other than one of the applications listed above, please consult with the project manager before beginning to ensure that your application is compatible with the system.

Validation and Submission

After the document transcription is complete, the document must be validated. This is accomplished through the Text Markup Validation Form. For valid documents the form offers the option of emailing the document or submitting it to XMLCollections. Please contact the project manager for details about how to handle valid documents for your project.

If a document is valid, the form displays a color coded version of the document to facilitate proof reading the markup, and a list of key elements (people, places, organizations) found in the document. After the document is proofread, a link on the page emails the original transcription and translated XML document to the project manager.

If markup errors are found in the document, a list of problems is displayed with the line numbers and text surrounding each error. These must be corrected before the document can be submitted to the manager.

Post-Translation Tasks

The translation to TEI markup leaves placeholders for certain elements that must be filled in manually: