Overview
The Initial Markup Scheme (IMS) for TEI text projects provides a means for rapid transcription of the majority of markup in simple text documents. The IMS is easier to read and type than pure XML and is readily converted into TEI XML. After conversion the resulting XML document can be proofread and enhanced as required by an individual project to produce a final XML document of any desired complexity.
The elements of a document that can be represented by the IMS fall into several categories:
- Major structural elements such as pages, chapters, subchapters and other major divisions.
- Text level structural elements including paragraphs and line breaks.
- Typographic elements such as underlines, superscripts, large or small characters.
- Conceptual elements including names of people, places, organizations and dates.
Markup
Structure
The structure of a document includes the chapters and subsections, and page breaks. Not all documents will include all of these elements. These structural elements are indicated as follows:
Element | Markup | Description |
---|---|---|
Document Header | == Document document_type == | Different kinds of documents have different structures. The document header indicates to the validation system what kind of document it should expect. See the Document Structure section for details on the possible values of "document_type". If no Document Header is found, the document is assumed to have a type value of "prose". |
Page | == Page image_number == | Must be placed at the start of each page, including the first line of the document. The image number is optional, and may be omitted if unknown. Each page in the document must be included in order in the transcription, even if the page is blank. If a page break occurs between paragraphs, a blank line must precede and follow the page marker. If the break occurs within a paragraph, the blank lines must be omitted. The document number is assumed to be all of the characters in the image_number preceding the first "_". |
Section | == Section section_type == | A section describes a major structural division of a document. Each section has a section_type which identifies the name of the section and implicitly indicates how the section relates to other sections in the document. Documents may have any number of sections, but sections follow a rigid structure which permits only certain kinds of sections at any given point in the document. The "document_type" specified in the Document Header determines which sections may appear in the document. |
Text Blocks
Blocks of text (lines and paragraphs) should be typed as they appear in the document. A paragraph should be followed by one or more blank lines unless it is the last paragraph in a section.
Line Breaks | line 1 line 2 |
Line breaks are denoted by line breaks in the text. |
Paragraphs | paragraph 1 paragraph 2 |
Paragraphs are denoted by a blank line in the text. |
Indented text | indented line | Lines that are indented should begin with a tab or space(s). |
Typography
Typographic markup is enclosed within
[[ ... ]]
brackets. Immediately following the opening brackets is a key words indicating the kind of markup, a space, and the text
that is marked up. The text may contain other markup, including structural markup such as lines, paragraphs or page breaks
as described above. For each pair of opening brackets, there must be a corresponding pair of closing brackets.
Example: (note the use of spaces between ]] and ]] to improve legibility)
this is normal text [[bold this text will appear bold [[italic this will be bold italic text ]] ]] this is normal text
Rendered as:
this is normal text this text will appear bold this will be bold italic text this is normal text
Element | Markup | Description |
---|---|---|
Block Letters | [[blockletters block lettered text appears here]] abbreviated form: [[block block lettered text appears here]] |
Rendered as
block lettered text appears here
|
bold | [[bold bold text appears here]] | Rendered as
bold text appears here
|
italic | [[italic italic text appears here]] | Rendered as
italic text appears here
|
underline | [[underline underlined text appears here]] | Rendered as
underlined text appears here
|
large letters | [[large large text appears here]] normal text here | Rendered as
large text appears here normal text here
|
superscript | normal text [[superscript superscript text appears here]] abbreviated form: [[sup superscript text appears here]] |
Rendered as
normal text superscript text appears here
|
Illegible Text
Markup of illegible text depends on the reason for the illegibility: Gaps in the text, damage to the document, or
merely illegible text are indicated with one of the tags listed below. If a reasonable guess about the original
text can be made, this can be indicated within the tag by enclosing the guess within
[ ... ]
after the first
]
bracket. More than one guess may be supplied if desired, by enclosing each within single brackets.
Example: (note the use of spaces between ] and [ to improve legibility)
[[illegible ] [my first guess] [my second guess]]
Element | Markup | Description |
---|---|---|
Gaps, missing or entirely unreadable text. | [[gap reason] [my first guess] [my second guess]] |
The kind of gap should be indicated by the reason. It must be one of the following statements:
|
Illegible text | [[illegible] [my guess]] | Use for any illegible text that does not fall into the first two categories. |
Notes
Notes are informational text added by the transcriber. They may be used as markers for parts of the text that require further review, or to describe any other aspect of the transcription that the transcriber wishes to record.
Element | Markup | Description |
---|---|---|
Note | [[note kind text of the note] e.g. [[note editorial check the transcription of this paragraph`]] |
The note kind should be one of the following words:
|
Changes
Areas where the author has crossed out or added text are covered by the changes. Multiple changes can be nested if warranted. If, for example, the author indicated text was to be added to a place in the text, but crossed out part of the added text, this should be indicated by a deletion within the addition.
Element | Markup | Description |
---|---|---|
Added/Inserted text | [[add location this is inserted text]] | The author indicated that text should be inserted into the main text at this point. The location indicates where
the author made the notation, and should be one of the of the following words:
|
Deleted/Crossed out text | [[delete this text was deleted]] abbreviated form: [[del this text was deleted]] |
Concepts
Conceptual elements are dates, places, or things which are significant and may have other information attached to them in the final website. These items should be marked so that they can be indexed and cataloged. These items should be transcribed within the given tags exactly as the author wrote them.
Element | Markup | Description |
---|---|---|
Dates | [[date July 3, 1884]] | |
Person | [[person Samuel Occom]] | |
Place | [[place Hanover, NH]] | |
Organization | [[organization Dartmouth College]] abbreviated form: [[org Dartmouth College]] |
Special Symbols
Special symbols are enclosed in pairs of less-than (<) and greater-than (>) signs.
Element | Markup | Description |
---|---|---|
m̅ | <<mb>> <<m bar>> |
"m bar" representing an abbreviation or repeated letter m (m with Unicode u305) |
⅌ | <<per>> | "Per" symbol (Unicode u214c) |
ſ | <<ls>> <<long s>> |
Long 's' (Unicode u017f) |
⁓ | <<sd>> <<swung dash>> |
Swung Dash (Unicode u2053) |
< | <<lt>> | Less-than sign |
> | <<gt>> | Greater-than sign |
[ | <<[>> | Left square bracket |
] | <<]>> | Right square bracket |
arbitrary character | <<uxxxx>> | Arbitrary Unicode character with code xxxx, where xxxx is the hexadecimal code value |
If an arbitrary character is used many times in a collection, please contact DLTG to request a simplified alias for the character (like "long s", etc.) to make the transcription easier to read. This will have no effect on the XML produced, but will make proofreading of the IMS easier.
Document Structure
Here are the various acceptable document_types:
Prose (default)
A simple prose document such as a book or essay. A prose document may have any of the following top sections:
front, body, back
. Each section may contain up to 9 levels of divisions, div1...div9, and the contents of each section must be contained
within at least div1 sections. Div1 typically corresponds to a chapter.
Letter
A formal letter. A letter may have the following sections:
Section | Content |
---|---|
body | This section is required and must appear immediately following the document header. The body section must be immediately followed by either a div1 or an address section. Exactly 1 body section is expected in a document. |
div1 | If present, this section must appear immediately after the body section and it must appear before and be immediately followed by an opener, content, closer, postscript or trailer section. No more than 1 div1 section is expected in a document. |
opener | Keyword content: date: the letter's date line salutation: the opening salutation If present it must appear after a div1 section. |
content | The content section starts the main body of the letter. It must appear after a div1 section and may contain div2 ... div9 sections. |
closer | Keyword content: salutation: the closing salutation signature: the writer's signature If present it must appear after a div1 section. |
postscript | Text content consisting of the letter's postscript, with optional div2 - div9 sections. If present it must appear after a div1 section |
trailer | Text content of any trailing material, with optional div2 - div9 sections. If present it must appear after a div1 section. |
address | Text content representing the recipients address. This section must appear after a body section. A div1 section is required following the address if any additional content follows. |
See the example documents and template documents for additional guidance on the accepted structure of a letter.
[ed: More structures coming.....]
Example Documents
These examples are for illustration purposes only. The text does not necessarily accurately reflect the proper markup of the original documents.
Example Prose Document
== Document prose == == section body == == section div1 == == Page ms794-001-001 == [[add top 250 words to a page]] The Chinese Ancestral Rites. By Charles D. Tenney, Formerly Counselor of the U. S. Legation, Peking, China. In the year 1700 the Chinese Emperor, Kang Hsi, issued his famous edict explaining the use of Heaven by the Chinese as meaning God and pronouncing the rites performed in honor of Confucius and the ancestors to be civil rites and not worship. The Jesuit missionaries became the advocates of this view and proposed to allow the Christian converts to continue to perform the ceremonies. The Catholic missionaries of other orders however, notably the Dominicans, dissented from the conclusion of the Jesuits, and referred the whole matter to the Pope for his decision. The Pope decided against the Jesuits. This angered the Emperor and led to the first persecution of the Catholic priests. All missionaries remained proscribed by Chinese law until the Treaties of 1858 gave them freedom to propagate the Ch[[add inline r]]istian religion. When the Protestant missionaries arrived in China , they accepted the ruling of the Pope in regard to the use of Heaven for God and also in regard to the Confucian and ancestral rites. As they wished to distinguish themselves before the Chinese from the Catholics[[add inline ,]] the Protestant Church became known as the [[add inline "]]sect of Jesus[[add inline "]], the Catholic Church being known as the [[add inline "]]sect of the Lord of Heaven[[add inline "]]. The[[add inline se]] are rather misnomers, of course, because the Protestants often use the term [[add inline "]]Lord of Heaven[[add inline "]], while the Catholics by no means leave out [[add inline "]]Jesus[[add inline "]] from their services. I have never felt satiafied with the decision of the early Protestant missionaries to accept the decision of the Pope regarding the Confucian and ancestral rites. Owing to the fact that these rites have been in use == Page ms794-001-002 == (2) among the Chinese for many generations, it has become a matter of duty to continue the ceremonies. So the first act required of a Christian convert is to do something condemned by his conscience. I remember well a conversation which I once had with an intelligent and serious-minded young Chinese. This young man said to me in my native place a rumor is current that the first act required of a Christian convert is to split up the ancestral tablets. I wish you to understand that I do not believe this rumor , but I [[add inline would like to]] refer to you as my authority for contradicting it." I was put in a very embarrassing position by this appeal. I could only say that Christian converts did not continue to practice the ancestral rites, though I had never heard of their being required to split up the tablets. I could see that the young Chinese was convinced by my reply that the rumor to which he had referred substantially correct. It would have been very easy to avoid the suspicion of unfilial conduct on the part of the Christian convert by drawing up a Christian ceremony to take the place of the old one and so allowed the continuity of the old rite to remain unbroken. They were wiser in the early days of Christianity about interfering with the habits of the pagans. The early church fathers simply put a new meaning into the old ceremonies and allowed them to continue. Thus the spring ceremony in honor of the goddess Eastre was continued and even made more elaborate, though a new meaning was put into it. This made the change of religion easier for the first generation of converts, and after the lapse of a few generations the old meaning was lost and forgotten. So the Saturnalia of the Romans and the winter festival of the Britons were quietly changed to a celebration of the birth of Christ. Even the formerly sacred mistletoe and holly were not forbidden to be used as decoration. == Page ms794-001-003 == (3) When an old rite is taken over with a new application or meaning, it soon loses its old objectionabl features in favor of the new meaning. I once had an experience which almost duplicated the early troubles of the Catholic missionaries with the Emperor Kang Hsi. I was serving at the time as president of the government University at Tientsin. The monthly rites in honor of Confucius were observed at the University as in all the Government schools of China, but because some of the students came from Christian families and I knew had been taught that the rites were sinful[[add inline ,]]I made the rule that there should be no roll call or marking of attendance, so that all who had conscientious scruples regarding the ceremony might absent themselves. When Yüan Shih-kai was governor of Shantung in 1901 he had begun to organize a provincial college and had invited one of the American missionaries to act as president. After the death of Li Hung-chang, Yüan had been promoted to [[add inline be]] Viceroy of the metropolitan province and Chou Fu, a distinguished Confucian scholar had been appointed to succeed him in Shantung. He came to me before leaving for his new post and explained that he was troubled because the missionary at the head of the new college had refused to allow the usual ceremony in Honor of Confucius to be performed in the college, His Excellency asked how I got over the difficulty , and I explained my arrangement by which Christian students were allowed to absent themselves from attendance. The new governor said that this arrangement would be quite satisfactory to him, provided that the usual rite prescribed for all government schools, should be continued for the benefit of the Confucianist students. He asked that I should correspond with with the missionary, suggesting to him my method of procedure. I did so, but was somewhat surprised to receive the reply that my method would be unsatisfactory to him because he considered it to be "dishonest. I communicated the reply to His Excellecy Chou Fu,who == Page ms794-001-004 == (4) then said that he regretted losing the services of the foreigner in the important fork of organizing the new college, but that he regarded it as essen- tial that the usual rites should be observed. He proposed another method of meeting the scruples of the missionary. He promised that on his arrival at his new post, he would issue a proclamation explaining the ceremony as an act of honor to Confucius as the founder of Chinese literature but in no sense an act of worship. That is, he would repeat the response that the Emperor K'ang Hsi had made two hundred years before to the petition of the Jesuits. I felt that this would be useless, but consented to send the message to the missionary concerned. In due course his reply came to the effect that he could not allow the continuance of the ceremony because he regarded its "tendancy to be idolatrous, however it might be explained. The result was that the well equipped foreigner was obliged to resign and turn over his important work to less competent Chinese hands. I was much struck by a remark of Governor Chou Fu in the couse of our conferences on this subject. He said"It is absurd to say that we worship Confucius. We Confucianists do not believe in the immortality of the soul. [[delete N]] [[add inline C]]onfucius no longer exists. How can you worship what does not exist? I considered it very unfortunate that the control of the new university should be lost to the competent hands of the foreign missionary through what seemed to me the unreasonable position of the Proestant Missionary body. the ancestral rites are so connected in China with the principle of honor to parents and ancestors that the discontin- uance of the ceremony brands the new faith as unfilial. To the Chinese the new religion suffers the same handicap that any new faith would suffer among us if the first act required of a convert were to go the cemetery and spit upon the graves of his parents. The fine feeling of respect for parents is one of the best features of the old Chinese civilization and in the process == Page ms794-001-005 == (5) of the modernizing of China that is now going on[[add inline ,]] that honor to parents , the value of which we recognize in the Fifth Commandment ,is fast disappearing. Instead of requiring Christian converts to pay no more attention to the ancestral tablets, I should much prefer that the church draw up a new ritual for Christian converts in which the thanks of the descendant should be expressed to God for the gift of life through the medium of the ancestors. The new ritual should include thanksgiving to God for the example and discip- line of the parents, and the old form of worship should be allowed to continue with these modifications. Thus the rite might be made even more elaborate for those within the church than for those outside. Protestans generally object to the laying out of food as an accom[[add inline pan]]iment of worship or as a sign of respect,fo[[add inline r]]getting the ritual of worship that is ordered in the Old Testament according to which food is offered in the worship of Jehovah. We decorate the graves of the departed with flowers. The Chinese would arrange small dishes of food about the graves to express the same feeling. In general, my experience in China has caused me to feel that the Protestant missionaries have been rather illiberal in not allowing the Chinese to express their feeling[[add inline s]] in their own way. They have generally tried to force upon them the Puritan forms of worship and the customs of other lands rather than to follow the lines of least resistance as did the old church fathers in their dealings with the pagans.
Example Letter
== document letter == == section body == == section div1 == == section opener == == Page 764475-3_001 == Date: [[place Lebanon]] [[date 25[[superscript th]] Aug[[superscript t]] 1764]] Salutation: [[person M[[superscript r.]] Occom]], Sir. == section content == Your time is so that, and your Business so crowding, that I can't desire such an Addition to your Bur— den, as your coming hither again would be: I therefore take this Way to hint to you what I would say more fully if you were here. And in the first place, I suspect you will miss of seeing [[person Mr. Kirtland]] on his Return from [[person Mr. Whitefield]], and also of seeing [[person Mr. Whitefield]], who I hear preached some weeks ago at [[place Philadelphia]], & consequently you will miss of receiving any supplys which he may have got for your journey; and if so, I advise you to represent the Case to some able Friends at [[place New York]], and if you can get Supply no other Way, hire the Money of Some good Friend till you return. I herewith Send you a Copy of our Commission from [[place Scotland]] in order that you may shew it, if you shall have occasion, to [[person Gen[[superscript l]] Gage]], [[person Gen[[superscript l]] Johnson]], or others. I would have you obtain 15 or 20 youth, if you can procure those [[delete that]][[add above which]] are likely, of remote Tribes of Indians. And if you hear that which is encouraging of good [[person Peter]] at [[place Onohoquagee]], and those two Boys there who were offered to the Comissioner at [[place Boston]], let them be of the Number. There was also an English Lad with the Mohawks to learn their Tongue, before this War, who I hear is very likely: if you can obtain such an one, do it. I shall leave the Proportion of Girls to you, & [[person Gen[[superscript l]] Johnson]], whose advice I would have you take in every Thing, when it may be had. And be sure, you let all the Children whom you bring, know that they don't come here to be without Government, nor to live a lazy sordid Life, but to be fitted for Business and lifefulness in the World. And I am not afraid that you Should boast of my Mohawk Boys Proficiency in very strong Terms. And don't fail to write to me as your Progress, Success, and any Occurrence that may be entertaining, by every opportunity [[note editorial this word is found at the right margin, continuing the previous line]] == page 764475-3_002 == Opportunity, as you know Friends at Home will be glad to hear. Send me an Acco[[superscript t]] of what Labour you have or Shall hire upon my Credit [[add above at Mohegan]]; and what you desire me to do for your [[add above Family]] while you are gone. And may the God of all Grace be with you & [[person David]] in all the way whither you go, and inspire you with Wisdom, Prudence, Zeal, Courage, and holy Fortitude, and honour you to be the Instrument to spread the Saviour of his Name, and the Knowledge of the great Salvation, far among the Pagans. == section closer == Salutation: Remember me respectfully to Friends in your Way, espe— cially at [[place N. York]]. — which with Love &c is the needful from Yours affectionately Signature: [[person Eleazar Wheelock]]. [[person Rev[[superscript d]] M[[superscript r]] Occom]] == section postscript == [[date August 27th]] P.S. [[person Mr. Kirtland]] returned last Evening has got no money. [[person Mr. Whitefield]] is at [[place N. York]]. talks of going to [[place Albany]] this Week if he can he will serve you, if he cant acquaint [[person M[[superscript r.]] Whitaker]] — do the best you can —
Template Document
When you start a new transcription, chose one of the templates below to get started. Delete any parts of the template that are not applicable to the document you are transcribing. Remember to fill in the image number for pages if you know the corresponding image number.
Prose
== document prose == == section body == == section div1 == == page == (replace this line with the content of the document)
Letter
== document letter == == section body == == section div1 == == section opener == == page == Date: Salutation: == section content == == section closer == Salutation: Signature: == section postscript == == section trailer == == section address ==
Applications
The transcriptions must be saved as plain text (.txt) documents without any application specific markup. Word (.doc or .docx) or RTF (.rtf) documents are not acceptable. Documents may be edited in Word or other text editors provided they are saved as "text-only" documents. Please review the instructions below that apply to the application you are using to ensure that you are saving the documents in the correct format.
Microsoft Word (Mac - 2011)
Word is able to edit and save text documents, but the process is relatively complex compared to other editors. We recommend using Word only if no other editor is available.
Create a new document as follows:
- Choose
File > New Document
- Choose
File > Save As...
- Choose Format:
Plain Text (.txt)
- Click
Save
(the File Conversion dialog box will appear) - Choose Text Encoding
Other Encoding
and clickUnicode 5.1 UTF-8
- Choose End Lines with
LF only
- Make sure "Insert line breaks" and "Allow character substition" are not checked
- Click
Ok
.
Notes:
- You will receive a warning each time you save the document indicating that some formatting may be lost. Word displays this warning even with empty documents and even if the document contains no formatting. You must click "Save" to save the document.
- Word will not remember the "Save" settings, so the next time you save the document, the Text Encoding and line endings will not be correct. You must use "Save As" each time you save the document starting with Step 2 above in order to save the document properly.
- While editing the file, you must take care not to use any Word formatting such as bold, italic or different fonts. This information will not be saved in the text file. Use only the markup described in this document.
TextEdit (Mac)
TextEdit is an application supplied with the Macintosh operating system. It is capable of editing plain text (.txt) and RTF (.rtf) documents, and is much easier to use than Word for plain text documents. It is found in the "Applications" folder.
To create a new plain text (.txt) document in TextEdit:
- Choose
File > New
- Choose
Format > Make Plain Text
(see notes) - Choose
File > Save
to save the document. - Choose Plain Text Encoding
Unicode (UTF-8)
(this should be the default, and will only be necessary the first time you save the file)
Notes:
- Using the TextEdit preferences you can choose whether the New command creates a plain text or RTF document by default. If a document is an RTF document, TextEdit displays a ruler at the top of the page. If you do not see the ruler and cannot find the "Make Plain Text" command, you already have a plain text document. While you are working on the Occom Circle project, it is recommended that you set this preference so that TextEdit automatically creates new plaintext documents.
- The font and type size for plaintext documents is changed in the TextEdit preferences. You may set this preference to any font and size you find convenient.
NotePad (Windows)
Notepad++ (free) http://notepad-plus-plus.org/ http://notepad-plus-plus.org/release/5.9 http://download.tuxfamily.org/notepadplus/5.9/npp.5.9.Installer.exe Notepad (comes with Windows) All Programs --> Accessories --> Notepad Notepad defaults to ANSI text but you can choose unicode or UTF-8 when doing a "Save As"
Word (Windows)
Word 2010 on Windows 7 Choose the No Spacing style Save As "Plain Text (*.txt)" You then get a dialogue box with warnings about losing formatting where you can choose to insert line breaks different then the default of CR|LF. Choices are CR|LF CR only LF only LF|CR Choose LF only.
Other Applications
Other applications are available for the Macintosh and PC that are capable of editing plain text files. If you wish to use something other than one of the applications listed above, please consult with the project manager before beginning to ensure that your application is compatible with the system.
Validation and Submission
After the document transcription is complete, the document must be validated. This is accomplished through the Text Markup Validation Form. For valid documents the form offers the option of emailing the document or submitting it to XMLCollections. Please contact the project manager for details about how to handle valid documents for your project.
If a document is valid, the form displays a color coded version of the document to facilitate proof reading the markup, and a list of key elements (people, places, organizations) found in the document. After the document is proofread, a link on the page emails the original transcription and translated XML document to the project manager.
If markup errors are found in the document, a list of problems is displayed with the line numbers and text surrounding each error. These must be corrected before the document can be submitted to the manager.
Post-Translation Tasks
The translation to TEI markup leaves placeholders for certain elements that must be filled in manually:
- The "when" attribute of <date> tags requires the canonical date in year-month-day format.
- The "key" attribute of <persName>, <orgName>, and <placeName> tags require the id of the item in the corresponding authority list.
- (?) The rend attribute of <del> tags must be supplied.
- Words containing the m-bar or long-s characters must be normalized using the following tags: <choice><orig></orig><reg></reg></choice>