Chronos is one of the many Smalltalk-related blogs syndicated on Planet Smalltalk
χρόνος

Discussion of the Essence# programming language, and related issues and technologies.

Blog Timezone: America/Los_Angeles [Winter: -0800 hhmm | Summer: -0700 hhmm] 
Your local time:  

2007-12-11

Smalltalk in One Page

Both Travis Griggs and David Buck have recently published their contributions to the "Smalltalk in One Page" project. So I thought I'd provide my condensed specification of Smalltalk syntax (which is excerpted from my Smalltalk primer/tutorial, Smalltalk: Getting The Message):

Smalltalk Syntax: Formal Specification

Below is presented the full and complete formal specification of the syntax (grammar) of ANSI-Standard Smalltalk, using a metalanguage known as Extended Backus-Naur Formalism (EBNF). The specific flavor of EBNF syntax used is as specified by the ISO International Standard for EBNF.

The EBNF grammar of Smalltalk is presented as a list of numbered production rules. Note that there are only 67 production rules, that five of them simply define aliases (alternative names) solely for conceptual clarity, and that over half the production rules concern themselves with literal values, comments, identifiers and other low-level lexical constructs.

Formal EBNF Specification of Smalltalk Syntax

  1. Character = ? Any Unicode character ?;
  2. WhitespaceCharacter = ? Any space, newline or horizontal tab character ?;
  3. DecimalDigit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9";
  4. Letter = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M"
                    | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
                    | "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m"
                    | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z";
  5. CommentCharacter = Character - '"';
            (* Any character other than a double quote *)
  6. Comment = '"', {CommentCharacter}, '"';
  7. OptionalWhitespace = {WhitespaceCharacter | Comment};
  8. Whitespace = (WhitespaceCharacter | Comment), OptionalWhitespace;
  9. LetterOrDigit =
                    DecimalDigit
                    | Letter;
  10. Identifier = (Letter | "_"), {(LetterOrDigit | "_")};
  11. Reference = Identifier;
  12. ConstantReference =
                    "nil"
                    | "false"
                    | "true";
  13. PseudoVariableReference =
                    "self"
                    | "super"
                    | "thisContext";
            (* "thisContext" is not defined by the ANSI Standard, but is widely used anyway *)
  14. ReservedIdentifier =
                    PseudoVariableReference
                    | ConstantReference;
  15. BindableIdentifier = Identifier - ReservedIdentifier;
  16. UnaryMessageSelector = Identifier;
  17. Keyword = Identifier, ":";
  18. KeywordMessageSelector = Keyword, {Keyword};
  19. BinarySelectorChar = "~" | "!" | "@" | "%" | "&" | "*" | "-" | "+" | "=" | "|" | "\" | "<" | ">" | "," | "?" | "/";
  20. BinaryMessageSelector = BinarySelectorChar, [BinarySelectorChar];

  21. IntegerLiteral = ["-"], UnsignedIntegerLiteral;
  22. UnsignedIntegerLiteral =
                    DecimalIntegerLiteral
                    | Radix, "r", BaseNIntegerLiteral;
  23. DecimalIntegerLiteral = DecimalDigit, {DecimalDigit};
  24. Radix = DecimalIntegerLiteral;
  25. BaseNIntegerLiteral = LetterOrDigit, {LetterOrDigit};
  26. ScaledDecimalLiteral = ["-"], DecimalIntegerLiteral, [".", DecimalIntegerLiteral], "s", [DecimalIntegerLiteral];
  27. FloatingPointLiteral = ["-"], DecimalIntegerLiteral, (".", DecimalIntegerLiteral, [Exponent] | Exponent);
  28. Exponent = ("e" | "d" | "q"), [["-"], DecimalIntegerLiteral];
  29. CharacterLiteral = "$", Character;
  30. StringLiteral = "'", {StringLiteralCharacter | "''"}, "'";
            (* To embed a "'" character in a String literal, use two consecutive single quotes *)
  31. StringLiteralCharacter = Character - "'";
            (* Any character other than a single quote *)
  32. SymbolInArrayLiteral =
                    UnaryMessageSelector - ConstantReference
                    | KeywordMessageSelector
                    | BinaryMessageSelector;
  33. SymbolLiteral = "#", (SymbolInArrayLiteral | ConstantReference | StringLiteral);
  34. ArrayLiteral =
                    ObjectArrayLiteral
                    | ByteArrayLiteral;
  35. ObjectArrayLiteral = "#", NestedObjectArrayLiteral;
  36. NestedObjectArrayLiteral = "(", OptionalWhitespace, [LiteralArrayElement, {Whitespace, LiteralArrayElement}], OptionalWhitespace, ")";
  37. LiteralArrayElement =
                    Literal - BlockLiteral
                    | NestedObjectArrayLiteral
                    | SymbolInArrayLiteral
                    | ConstantReference;
  38. ByteArrayLiteral = "#[", OptionalWhitespace, [UnsignedIntegerLiteral, {Whitespace, UnsignedIntegerLiteral}], OptionalWhitespace,"]";

  39. (* The preceding production rules would usually be handled by the lexical analyzer;
         the following production rules would usually be handled by the parser
    *)
  40. FormalBlockArgumentDeclaration = ":", BindableIdentifier;
  41. FormalBlockArgumentDeclarationList = FormalBlockArgumentDeclaration, {Whitespace, FormalBlockArgumentDeclaration};
  42. BlockLiteral = "[", [OptionalWhitespace, FormalBlockArgumentDeclarationList, OptionalWhitespace, "|"], ExecutableCode, OptionalWhitespace, "]";

  43. Literal = ConstantReference
                    | IntegerLiteral
                    | ScaledDecimalLiteral
                    | FloatingPointLiteral
                    | CharacterLiteral
                    | StringLiteral
                    | SymbolLiteral
                    | ArrayLiteral
                    | BlockLiteral;

  44. NestedExpression = "(", Statement, OptionalWhitespace, ")";
  45. Operand =
                    Literal
                    | Reference
                    | NestedExpression;

  46. UnaryMessage = UnaryMessageSelector;
  47. UnaryMessageChain = {OptionalWhitespace, UnaryMessage};
  48. BinaryMessageOperand = Operand, UnaryMessageChain;
  49. BinaryMessage = BinaryMessageSelector, OptionalWhitespace, BinaryMessageOperand;
  50. BinaryMessageChain = {OptionalWhitespace, BinaryMessage};
  51. KeywordMessageArgument = BinaryMessageOperand, BinaryMessageChain;
  52. KeywordMessageSegment = Keyword, OptionalWhitespace, KeywordMessageArgument;
  53. KeywordMessage = KeywordMessageSegment, {OptionalWhitespace, KeywordMessageSegment};
  54. MessageChain =
                    UnaryMessage, UnaryMessageChain, BinaryMessageChain, [KeywordMessage]
                    | BinaryMessage, BinaryMessageChain, [KeywordMessage]
                    | KeywordMessage;
  55. CascadedMessage = ";", OptionalWhitespace, MessageChain;
  56. Expression = Operand, [OptionalWhitespace, MessageChain, {OptionalWhitespace, CascadedMessage}];

  57. AssignmentOperation = OptionalWhitespace, BindableIdentifier, OptionalWhitespace, ":=";
  58. Statement = {AssignmentOperation}, OptionalWhitespace, Expression;
  59. MethodReturnOperator = OptionalWhitespace, "^";
  60. FinalStatement = [MethodReturnOperator], Statement;
  61. LocalVariableDeclarationList = OptionalWhitespace, "|", OptionalWhitespace, [BindableIdentifier, {Whitespace, BindableIdentifier}], OptionalWhitespace, "|";
  62. ExecutableCode = [LocalVariableDeclarationList], [{Statement, OptionalWhitespace, "."}, FinalStatement, ["."]];

  63. UnaryMethodHeader = UnaryMessageSelector;
  64. BinaryMethodHeader = BinaryMessageSelector, OptionalWhitespace, BindableIdentifier;
  65. KeywordMethodHeaderSegment = Keyword, OptionalWhitespace, BindableIdentifier;
  66. KeywordMethodHeader = KeywordMethodHeaderSegment, {Whitespace, KeywordMethodHeaderSegment};
  67. MethodHeader =
                    UnaryMethodHeader
                    | BinaryMethodHeader
                    | KeywordMethodHeader;
  68. MethodDeclaration = OptionalWhiteSpace, MethodHeader, ExecutableCode;

To resolve any ambiguities that may arise due to the absence of optional whitespace, lower-numbered production rules take precedence over higher-numbered production rules. The ambiguity issue is normally taken care of by having production rules 1 through 38 handled by the lexical analyzer, but having the remainder (production rules 39 through 67) handled by the parser.



When compiling a method declaraion, MethodDeclaration is the grammatical start symbol. When compiling executable code that's not a method definition, ExecutableCode is the start symbol (it's possible, for example, to select a section of a method's code in a code browser and execute it, without invoking the method itself; the same is true of the text of code comments, if they happen to contain valid Smalltalk code.)


No comments: