Digital Texts / Digital Codes

        Remember our "reading vs. seeing" exercise with the toothpaste packages?  Look at this Web page, don't read it.  What do you see?  How is it produced before your eyes?  You have been trained to accept as natural that when you apply the point of a ballpoint pen to a piece of wood-pulp paper it will (if properly charged with ink) create a predictable width line for as long as you press down.  You also have learned to form an alphabet of characters with that "stylus," so as to create words and sentences and paragraphs according to our culture's format for visually representing speech.  The stylus/paper technology has been stable for about 800 years, despite occasional innovations in manufacture of the substrate (what is "paper" made of?) and the stylus (reeds, quills, mechanical "fountains," ball-point, gell inks).  But you also know how to type, though few of you probably ever have operated a "typewriter," the original technology that put the power of the printing press at the fingers of an individual writer.  So how does the Internet, or your word processing program or email program, record, store, and reproduce the characters you type and read, like these?  Trust me--it has very little relationship with the typewriter's simple, "analog" mechanical structure and direct-to-paper transfer of inked type.  As you read this page, you are running many layers of computer code which, invisibly to you, translate my keystrokes into storable data and retranslate them into readable characters.

Primitive Machine-Level Codes That Talk Directly to the Micro-Processor--be sure to click on the hyperlinks to see what these codes look like!:

Machine Code: the first-generation programming language that talks to the machine directly in binary hexadecimal code (1950s, then addressed by assembler codes).  (The machine code example comes from Dr. Carl Burch's "SymHymn Machine Code Example Program" supporting the SymHymn site at the Science of Computing Suite,  Computer Science Department, Hendrix College.)

Assembler Code: the second-generation programming language that is interpreted by the machine code to address the machine directly, a set of mnemonic  abbreviations learned by programmers so that they did not have to write in machine code.  The example contains lines beginning with a semi-colon followed by English language text--the semi-colon warns the computer to ignore the line because it contains instructions for the human programmers who have to maintain the program, otherwise the program would "crash" (fail to execute its instructions).  Note the missing word "to" in the final sentence of the "Definition Phase" explanation!  We can still process that sentence's instruction by inferring the missing word from context, but a computer encountering such a missing command in a program would halt the program and generate an error message calling for that line of the program to be "debugged."  (The Assembler code example comes from James Hamblen's book  chapter "Introduction to WinTim."  [Georgia Tech School of Electrical and Computer Engineering])

Higher Level Progamming Languages: the Assembler is intended to translate outputs from these languages that human beings find easier to write without errors.  Some of the earliest, dating from the end of World War II to the 1960s, were FORTRAN, LISP, COBAL, and ALGOL.  These relatively primitive languages were used to program the first major business machines and some (esp. COBOL) are still in use today, though buried beneath a "shell" of translation code that permits non-COBOL-writing programmers to operate them.  If such a program develops a "bug," perhaps because the input data contains new, undefined information, an experienced programmer who knows COBOL may have to dive deep into the code to repair it.  The alternative is writing a brand new program in a modern higher level language.  The first language specifically developed for artificial intelligence (AI) programming was LISP.  Modern AI programmers currently (2021) use a language named Python.  Other contemporary (2021) higher level languages are  Delphi, Perl, Ruby, and Java.  (The Ruby code examples come from Dr. Thomas Bennett's Web page, Mississipi College Computer Science Dept.)

Higher Text Programming Codes That Tell the Micro-Processor How to Store, Display, and Print Text:

ASCII Code:  In 1963, the American Standard Code for Information Interchange invented a numerical code which was agreed upon by a committee of scientists working for the the American Standards Institute, ancestor of ANSI (American National Standards Institute), the body that creates and evaluates all sorts of basic operating codes, standards, and practices used in business and scientific research.  Its 8-digit] "binary code" is among the oldest and most basic instructions for transmitting digital text.  Created for Teletype machines that repeated news stories, typed from reporters' original copy at a central location, to networks of newsrooms around the world, ASCII characters told these receiving electronic typewriters what to type, as well as when to indent, skip a line, or ring a bell to signal an important news story.  (At 33 seconds, you will hear one bell to signal an ordinary presidential news conference summary, and at 2:25, a "two-bell" story is announced telling the world that the U.S. soccer team had been eliminated by Ghana from the World Cup.)  United Press International [UPI] rated stories by their bell count: "Bulletin" or "Urgent" stories got five bells; the Kennedy assassination and FDR's death were "Flash" stories, fifteen bells.  ASCII, the American Standard Code for Information Interchange, was devised in 1960-63 by the American National Standards Institute (ANSI) based on well-established teletype codes that had been working since the 1950s.   Because computers can only perform mathematical operations like addition, subtraction, etc., or logical operations that can be represented mathematically (Boolean AND/OR/NOT sorting), all text you see on a computer screen first was a number in machine code which referred to a character in a font table that was, itself, represented by numbers telling the computer what shape to draw on the screen and where to put it.  A capital "A," for instance, is "01000001," and a small "a" is 01100001  When you tell MS-Word to save a file as "Text Only," you are saving only the ASCII characters without other formatting.

Waterloo Script  A pre-WYSISYG (What-You-See[on the screen]-Is-What-You-Get[when you print the document]) word processing system for mainframe computers.   The user had to master at least a basic set of Script mnemonic codes in order to get the document to print out legibly, and those codes would only be activated when the document was sent ("spooled") to the mainframe computer's line printer, a single high-speed device that served the entire community's printing needs.  Printing delays often were measured in hours.  Script reversed the "comment" convention (see Assembler above) so that when the machine saw a period, later a colon, in the left margin, it would assume it was reading a machine instruction code (e.g., ".pp;" for paragraph) and anything following the semi-colon or occurring on a line that does not begin with a period was treated as plain text.  In time, Script became the basis for GML or Generalized Markup Language, the ancestor of HTML (below) and XML  GML instructions started with a colon in the left margin to distinguish them from Script instructions, but otherwise their logic was very similar to Script's.  Click here for the University of Waterloo SCRIPT features and system requirements page (1990).

HTML  Hyper-Text Markup Language, a descendant of Waterloo Script via "SGML" (Standardized Generalized Markup Language), the first attempt to standardize all digital document formatting, HTML is the standard language used to create web pages.  If you go to Internet Explorer while viewing this page and click on the "View" menu at the top, then click on "View Source," you will see the source code behind the text that is displayed in WYSIWYG on your screen and your printout.  To see it in Mozilla Firefox, go to Tools-->Web Developer-->View Source.  All of that should remain invisible to you when you read digital text, even as the type compositor's assembly of a string of lead type units on a composing stick or the printer's pulling the tympan down upon the paper and plate is invisible to the reader of a printed book, or the scribe's individual pen strokes and preparation of a calf skin to become parchment for writing is invisible to the reader of a manuscript.  Early versions of MS-Word would show users its markup codes, as well, but this word processing program has been WYSIWYG so long that this is no longer considered by Microsoft to be necessary for ordinary users of the program.

XML  Extensible Markup Language (ca. 1998), the current descendant of SGML and Script, this is the newest standard digital code for creating Web based documents and other artifacts, including those which contain video and audio.  XML "tags" look very like those of HTML, but they are language-independent (e.g., non-Roman characters) and they can interrelate information of the same type in many documents.

MARC (a specialized code for running library catalogs): You can see the MARC code for any library catalog search result by clicking on the "MARC Display" on the top menu buttons.  Cataloging librarians tend to find he MARC view easier to read, because it's the form of document they write!

        If you want to learn more about code and coders (those engineers who write code), set aside a few hours to read Paul Ford's long-form article, "What Is Code?," Businessweek, June 11, 2015, available at http://www.bloomberg.com/graphics/2015-paul-ford-what-is-code/.  Because of its use of graphics to explain coding concepts, you cannot print it out for reading, even though it is the length of a print novela or long sort story.  Ford attempts to explain code to business people who work with coders but who are increasingly unable to understand what their colleagues are doing and saying.  The very existence of the article signals an impending shift in commercial software development.  Coders need business types to help them make money from their inventions, and busines types need coders to invent products they can sell, but when the two can no longer speak the same language, something big may be about to happen.  Ford can introduce you to much more powerful coding languages than the few mentioned above, such as "C," "C++," and "Python."  HTML and XML are really more about controlling how text displays on Web based applications vs. robust programing languages that actually run the world you live in, from your cell phone and laptop to airplanes and elevators and not a few vending machines on campus.