Introduction ~~~~~~~~~~~~~~~~~~~~~~ This is Mem, a multilingual environment for Lamedh/Lambda. The name derives from the letter that comes after Lamedh --because Mem should go after Lamedh-- and from Multilingual EnvironMent. Its aim is to provide the possibility to write multilingual document and to provide a framework where new languages can be added easily by User Groups and/or developers interested in doing that. This package would no be possible to the previous work made by Yannis Haralambous and John Plaice. Note at some places the name Lambda is still used. I expect it will be removed soon. This package is not intended for real use but just to make tests. The previous version of the readme file follows, which some changes in the name of files. In addition, there is a further sample named russian, which demonstrates how encoding selection works (UT1/omlgc vs. T2A/cmr) and how a new level of ocp's can be easily added (in that case, to transliterate from Latin to Cyrillic). Javier Bezos 2004/08/15 ======================= Preliminary Remark ~~~~~~~~~~~~~~~~~~~ After presenting this work in Tsukuba (March 2001) I have some doubts about the future of the included files. LaTeX3 (or maybe LaTeX2e*) is almost here and therefore it doesn't make sense taking a different path. How lambda will evolve will depend largely on the multilingual model of LaTeX3; it's not unlikely that lambda, as presented here, will vanish... but perhaps they won't and they will become a package working on top of LaTeX3, or even a multilingual capable class. As of today it's impossible to say which the future will be. However, and despite the reservation made, I think that the bundled files are of enough interest and I will like to get some feedback. Javier Bezos 2001 Now the original readme follows: ========================================================= Some remarks. Firstly of all, will it work? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Well, some parts will and some other will not. For example, automatic selection of fonts is still at a very early stage (to be generous) and it will not be correctly synchronized with runnings heads; and writing direction will not work at all just because I've not tackled that at all. I would like to note that I implemented that as fast as possible in order to have a working package in Tsukuba. The resulting code is somewhat chaotic and unstable (and sometimes naive), but I hope it will be enough to begin to do simple experiments. It works with Omega 1.8. Files ~~~~~ As you can guess, mem.sty is the kernel of the system. There are files describing languages, named with the ISO three letter code (esp.ld, eng.ld, fra.ld and ell.ld), and files describing scripts, named with the ISO two letter code (la.sd and el.sd). Regarding TeX, there is a further file with the configuration of the system: mem.cfg. Then come otp files. isolat1, isoell and macstd are similar to the corresponding files currently in Omega, but they can escape to utf8 and ucs16. However, after experimenting a little, escaping to utf8 is fairly complicated with arbitrary text. inputtex defines TeX input conventions. fratext defines (visual) text transformation for French. The files for Greek are those by Yannis and John with new names beginning with ell: this is a proposal to systematize names. OT1.otp, T1.otp and UT1.otp provides tranlation from Unicode to the corresponding font encodings. They are very quick and dirty, and in fact T1 is the same file than UT1 with a few lines added! Accents above work and may be stacked, accents below don't work ar all. OT1 and T1 are used with onfss.sty, since they are not intended to form part of the core mem. Finally, a little package named spguill adds spaces before and after guillemets in non French text. It requires spguill.otp and demostrates the possibilities of the scheme. mem.tex explains most of macros, but there are some of them which are not documented yet. However, I think that their names are mostly self-explanatories. Samples ~~~~~~~ greek.tex contains both French and Greek text. The Greek text has been taken from the Greek TeX Group, so in addition you will learn how to become member of it :-). You should note that \MakeUppercase doesn't work correctly at some places (eg., the running head with French text should be unaccented; the problem here is pretty simple: when \MakeUppercase is called it does not know that the corresponding ocp will be changed by french. Thus, \frenchtext must see in a future a "case status" set by \MakeUppercase and behave accordingly). Only modern monotonic Greek! yatest.tex prints the date in Spanish, English and US English. testmisc.tex contains miscelaneous tests. spguill.tex provides an example for spguill. Random remarks ~~~~~~~~~~~~~~ - Scripts will have a default dummy language. This way, specific actions for this script are possible even if the main language uses a different script. - Currently languages only have one script. However, some languages can be written with several scripts (eg, Azeri [Latin, Arabic, Cyrillic] or Spanish [Latin, Hebrew]). - I'm now studying how to accomplish macros depending on scripts, namely for fonts, case, and so on. - I'm studying as well how to replace the two level system by a three level one (document, paragraph/block, text). - If you get an error of bad encoding or font, don't be worried; script specific macros are using the NFSS macros for its own purpose and some fine tuning is still to be done. - Many "auxiliary" files are far from complete. In fact, they are fairly uncomplete, but I will continue adding more code only when we had decided the "right way". - Currently, the code includes some experiments I've done, mainly: - Automatic selection of font encoding based on fd files--if there is an fd file for some combination then select it (with certain preferences). Hovever, it turns out that t1cmr exists but pointing to _another_ font, and that ot1omlgc points to an ut1 encoded font. Sometimes I give the encoding explicitly with \SetFontEncoding{}{} - An escaping mechanism in input encoding otp's, which will allow to enter Unicode text (ucs16 or utf8) without changing the current ocp list (otherwise ligatures and kerning could be killed). It works fine when applied that to a single char, but I didn't manage to extend it to arbitrary text (including non expandable primitives--ocp states are not saved). - There are lots of open questions, and no doubt they will appear when discussing Mem. - The files have been tested on Linux (RadHat 7.0) and MacOS (CMacTeX 3.4). The original message ~~~~~~~~~~~~~~~~~~~~~~ The following message was sent to the omega list by October 2000, and it explains the main aims of mem, with some examples. I've changed my mind in some points, but most of it remains true, at least in concept. From: "Javier Bezos" Date: juev., 5 octu 2000 15:11 To: omega@ens.fr Subject: [omega] Lambda Hi all, This is a short description of the goals of Lambda and the way they will accomplished. These are very preliminary ideas, and they are very likely to change before the beta version, which is expected to be released by the next EuroTeX meeting. General description ~~~~~~~~~~~~~~~~~~~ The goals of Lambda are mainly: - to provide a set of high level macros for users and developpers of language styles, which "hide" the involved primitives and make them easier to use. - to coodinate different languages so that Omega will become a true _multilingual_ environment. This will be done is such a way that you may still use non-Lambda styles, because you may switch off the internal modifications (provided, of course, the non-Lambda styles can switch off their modifications). So, you will be able to say \languageunset, then swith to other language, and switch back to Lambda with, say, \languageset{spanish}. Small pieces of text are inserted with the help of \languagetext which is currently essentially the same as \languageset except that in a future it could handle writing direction in a somewhat different fashion. Let's now explain how TeX handle non ascii characters. TeX can read Unicode files, as xmltex demostrates, but non ascii chars cannot be represented internaly by TeX this way. Instead, it uses macros which are generated by inputenc, and which are expanded in turn into a true character (or a TeX macro) by fontenc: È --- inputenc --> \'{e} --- fontenc --> ^^e9 That's true even for cyrillyc, arabic, etc. characters! Omega can represent internally non ascii chars and hence actual chars are used instead of macros (with a few exceptions). Trivial as it can seem, this difference is in fact a HUGE difference. For example, the path followed by é will be: È --an encoding ocp-| |-- T1 font ocp--> ^^e9 +-> U+00E9 -+ \'e -fontenc (!)----| |- OT1 font ocp -> \OT1\'{e} It's interesting to note that fontenc is used as a sort of input method! (Very likely, a package with the same funcionality but with different name will be used.) For that to be accomplished using ocp's we must note that we can divide them into two groups: those generating Unicode from an arbitrary input, and those rendering the resulting Unicode using suitable (or maybe just available :-) ) fonts. The Unicode text may be so analyzed and transformed by external ocp's at the right place. Lambda further divides these two groups into four (to repeat, these proposals are liable to change): 1a) encoding: converts the source text to Unicode. 1b) input: set input conventions. Keyboards has a limited number of keys, and hands a limited number of fingers. The goal of this group is to provide an easy way to enter Unicode chars using the most basic keys of keyboards (which means ascii chars in latin ones). Examples could be: * --- => em-dash (a well known TeX input convention). * ij => U+0133 (in Dutch). * no => U+306E [the corresponding hiragana char] Now we have the Unicode (with TeX tags) memory representacion which has to be rendered: 2a) writing: contextual analysis, ligatures, spaced punctuation marks, and so on. 2b) font: conversion from Unicode to the local font encoding or the appropiate TeX macros (if the character is not available in the font). This scheme fits well in the Unicode Design Principles, which state that that Unicode deals with memory representation and not with text rendering or fonts (with is left to "appropiate standars"). Hence, most of so-called Unicode fonts cannot render properly text in many scripts because they lack the required glyphs. There are some additional processes to "shape" changes (case, script variants, etc.) User interface ~~~~~~~~~~~~~~ Some of the features of Lambda will be: * You can switch between languages freely. You have not to take care of neither head lines nor toc and bib entries--the right language is always used. You can even get only a few commands from a language, not all of them, because they belong to one of several groups: layout, date, names, text, tools, math, and so on. * Dialects--small language variants--are supported. For instance: demotic and katharevusa. * Customization is quite easy---just redefine a command of a language with |\renewcommand| when the language is in force. The new definition will be remembered, even if you switch back and forth between languages. * An unique layout can be used through the document. Commands are pretty simple, and I've given some hints above. An example will be illustrative: \documentclass{book} \usepackage[encoding=latin1,arabic,english]{lambda} % The encoding in the document is latin1, except if % overriden explicitly by a language. \languageset*{english} % Set english as the main language. % All groups are activated. Not strictly necessary, because % it's automacally done by the package. \languageproperties{arabic}{input=latin} % Let Lambda translate from latin to arabic. \begin{document} An Arabic text: \languagetext{arabic}{Abû al-Layth al-Samarqand"}. And again English. % The layout, date and names groups are still those of % english. \end{document} Developper interface ~~~~~~~~~~~~~~~~~~~~ Here are a few examples of code for a style file: \DeclareLanguage{dutch} % Setting things up \SetEncodingDefault{latin1} % the encoding used in this % file, to make sure that the text written here is % correctly transcoded. \DeclareLanguageCommand*{names} {\listfigurename}{Lijst van figuren} % The star means that \chaptername will select latin1 % even if the encoding in the document is, say, applemac. \DeclareLanguageCommand{text}{\dots}{\mbox{...}} \SetLanguageProcess{input}{texinput,ndlinput} % Two files for input conventions: % - texinput provides ---, --, etc. % - ndlinput provides ij => U+0133 There are many other commands to handle: - properties: \DeclareLanguageProperty - shape changes \DeclareShapeProcess - dates: \DeclareDateCommand \DeclareDateFunction \DeclareDateFunctionDefault - values: \SetLanguageValue \SetLanguageCode [\catcode, \sfcode, etc.] ___________________________________________________________ Javier Bezos | TeX y tipografia jbezos at wanadoo dot es | http://perso.wanadoo.es/jbezos/ ........................................................... CervanTeX http://apolo.us.es/CervanTeX/CervanTeX.html