Introduction
Mathematicians have developed a very efficient way to communicate mathematical knowledge. They use a kind of partially fomarlised natural language and, most importantly, a way to lay out formulea on a two-dimensional surface.
Evolving technology has made it easier to publish mathematical knowledge. The typesetting system TeX made it possible to write high quality typsetted mathematical texts on a computer. Together with LaTeX, a macropackage for TeX, it became efficient for the avarage mathematician to write such texts. While printing is still important, a new way of sharing them has become popular: sharing them in an electronic form on the world wide web. The question turns up, how to manage this knowledge and how mathematics can benefit from the capabilities of the web.
We have by far not exhausted the full potential of the web with respect to the management and communication of mathematical knowledge. In fact, we are just at the beginning.
Advantages
This section is intended to grab your attention and to encourage you to read on. Therefore things will be mentioned here that could be done in theory but are not practical right now. However, work in this direction is going on.
Here, what we are striving for are the new possiblities the web gives us. We are not interested in things we can already do by paper. Say, students filling out quizzes and checking their answers is nothing new. The same can be achieved by just handing out some sheets of paper with the questions to the students. The answers can also be handed out and the students can check their answers themselves. It would certainly be faster to do the same using computers, but this does not have anything to do with the web. The internet allows the students to do the quiz at home, but again, this is provided by the internet, not the web.
The core of the web is linking, hence its name. Assume that in a formula some symbol is used and you don't know what it means. If it is linked to its definition, you can just select it to learn about it. Also, you can do a search for other definitions for the same symbol. A proof can link to all theorems it uses, so you can look them up without hassle. Each proofstep can be linked to all information it makes use of. This sometimes can help a human reader, but if made carefully, it can be useful for machines too. Thanks to linking, it is also possible to do a web search for other proofs of the same theorems or for theorems about the same concept.
Another important idea of the web ist that content is presented to the user in adaptable manner. You can read the same document on a PC, a tablet a phone. All this without the need of scrolling the document horizontally: The width of the text and formulas is adapted automatically to your display. It is even possible to let the computer read the text aloud. Depending on the speed and avaiable resources of your device, formulas are rendered efficiently or with high quality. Colors and fonts can be changed automatically to meet your taste and to integrate into your desktop. Notations used in formulas can be adapted to what you are used to. (Did you ever mess up things because you took a ⊂ for a ⊆ although the author meant ⊊?) Formulas can even be adapted to your knowledge. If you are new to the topic, all details can be shown. If you are more advanced, obvious things can be left out.
On the web, one tries to separate meaning from presentation. In other words, one writes down the meaning and additionally, one cerates rules how the meaning shall be presented. Above, we mentioned searching for definitions of a symbol. Because the meaning of this symbol is known, we can search for the äquivalent definitions, not only for definitions of something that uses the same character in a formula. We also mentioned that you can adapt formulas to your knowledge. This is done by changing the rules used for generating the presentation. These rules can be reused. So you can define rules for common mathematical concepts once and then they are applied in all documents you read. A computer can understand something much better if its meaning is given instead of only its presentation. This allows a computer to prove theorems on its own or help you with it.
Presentation and Semantics
It is important to understand the difference between the presentation of a mathematical formula and its meaning. The presentation is what the mathematician writes on the sheet of paper and the meaning is the thing in his mind he wants to communicate.
How someone writes down a formula depends on many things, his cultural background, his personal preferences, the topic of the text, the context the formula is in and so on. It even depends on the intended audience, the time and equipment available to write something down and many more.
On the other hand, if you are shown an isolated formula then it is impossible to determine its meaning, because it depends on the context. If you have the context, say the whole book or whole paper, then you can understand the formula, given you understand the mathematics behind it. Because a computer usually does not understand the mathematics it has even truble to defer the meaning when the context is available.
The most straightforward way to fix this problem is to give the meaning explicitely. This can be done by using a language like OpenMath. OpenMath represents the meaning without ambiguity.
Strategies
There are several strategies to publish mathematical content on the web, some of them taking more advantage of the web than others. This list is, of course, not complete, but it covers the ones I know about.
LaTeX and PDF
LaTeX has become very popular because of its high quality visual output and because a mathematical text can be written rather quickly. However, LaTeX catches only the (visual) presentation. These days, mathematical papers are usually written in LaTeX and published as PDF.
HTML plus raster graphics
This is the only strategy that has all drawbacks at once and virtually no advantages. I still wonder why it is used so frequently. The only reason that comes to my mind is that the browsers widely used are completely outdated or useless and hence people resort to images.
The drawbacks are obvious. Images can not be scaled. It is not possible to cut and paste a formula. They can not be styled using stylesheets. They take a long time to load. They do not render at the right position, the baseline is often shifted vertically.
HTML plus vector graphics
This is certainly an improvement over raster graphics, since it allows arbitrary scaling without quality loss. Unfortunately, only newer browsers support SVG.
XHTML plus Presentation MathML
For the enduser this is the best way to go. Personally I think that
the user can be expected to use a browser that supports MathML. Even
the Microsoft Internet Explorer supports MathML thanks to the
MathPlayer plugin. (Though Internet Explorer still fails to handle
documents with mime-type application/xhtml+xml.)
OMDoc
XHTML is missing elements for important structures in a mathematical text, namely definitions, assertions (theorems, lemmata), proofs and examples. OMDoc makes this structure explicit. Furthermore, proofs are properly linked with the assertions they proof, Definitions are properly linked to the symbols they define and so on.
OMDoc allows one to be less formal, using natural language and formulae in a presentational form. It is also possible to be more formal by replacing the formulae with their semantic counterparts and employing notation definitions for presentation to the user. It is also possible to completely formalize all or part of the document. You can also formalize everything, using the logic system of your choice.
As a final point, OMDoc takes linking very seriously. Every theory and symbol gets a URI. It is possible to import other theories into a theory, even if it is part of another document. Imported theories can be adapted to the current one using theory morphisms. Say a theory about fields imports a theory about groups adapted for the addition in the field and the same theory a second time adapted for the multiplication in the field.
OMDoc is an application of XML and makes use of OpenMath or MathML. Therefore it is very inhuman to edit it with a simple text editor.
sTeX
People who are familiar with LaTeX and want to write semantics should consider this option. One can generate beautiful PDF documents from it, like it is known from LaTeX and one can generate OMDoc from it. (OMDoc in turn can then be converted to XHTML+MathML.) Unlike OMDoc, it is more human editable using a text editor, nearly as much as LaTeX.
Why not just TeX?
When talking about how to handle mathematical formulae on the web, this is a question often heard. Indeed, many people wonder why MathML does not just use TeX instead of its verbose XML syntax.
As a first step to approach this question, we have to think about what TeX actually is. On the one hand, TeX is a typesetting system. (Remember the old days of printed books: The typesetter had to place tiny metallic letters on a stick. This is basically what TeX does. And for that, it needs the letters, i.e. fonts.) On the other hand, it is a macro processor. Yes, the language TeX uses is a programming language. However, TeX itself is too basic to be useful. You really need a lot of macros to work efficiently. There are many macro packages.
If MathML would use TeX, a browser would have to implement TeX and provide many macro packages and even fonts. The problems this creates come naturally:
- Since TeX is a programming language, one has to care about many things (as with javascript): Endless loops, sucking up of resources, security issues.
- Many macros must be available and this set must be standardized, otherwise all browsers will support a different set of macros and we get a nightmare of incompatibility.
- A variety of fonts would have to be available. And mind you when a mathematician has the bright idea to use yet another symbol.
- TeX would have to be integrated into the layout engine of the browser somehow. (Computing length of a formula, breaking it at the right places, getting the base line to match the surrounding text.)
- How would it be possible to dynamically change formulae using JavaScript? How to apply CSS on formalae?
Note while using TeX, the following issues can be solved by using a macro package that solves them instead of LaTeX. However, I doubt that people want to use a completely different LaTeX package. So, what are the defficiencies of LaTeX?
- The structure of a formula is represented badly.
- Certain things which are important for aural presentation are left out.
- The meaning of the formula is not encoded, only its visual appearance.
Gain and Effort
It remains the question why people do not use the most powerful strategy (OMDoc). The answer is the same as for why mathematicians do not completely formalize their proofs: The amount of time it requires and whether it is worth it.
Why do people use LaTeX, why do people use HTML? In some way, the gain from using these pays out the effort they require. The diagram shows some strategies at their position in the effort/gain-space. The diagonal line is the line of acceptance. The strategies above are used, the ones below are not. (Note that this line is hypothetical and may have a different gradient depending on the situation.) The goal of the developers is of course to move OMDoc and sTeX in direction of less effort and more gain.
Workflow
A very important thing for someone to accept one of the strategies from above is to have a workflow. The usual workflow for LaTeX is roughly to write LaTeX code in a text editor, run LaTeX, check output with a PDF or DVI viewer. The exact tools used (which editor, which LaTeX distribution, which viewer), which features one uses of them, where they are placed on the desktop is all part of the workflow. Also, which computer and operating system one uses. Also, where one works and whether the light is on or not. It is just the way you are used and good at.
Every person has her own workflow. For LaTeX, most mathematicians have a workflow they are happy with. The reason for this is probably that the required software and documentation is readily available. The same can not yet be said about OMDoc. sTeX has the very important advantage that the LaTeX workflow can be used for it without too many modifications. One can happily use the same text editor, LaTeX distribution and PDF viewer.