1324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver#!/usr/bin/ruby
2324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver# encoding: utf-8
3324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
4324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver=begin LICENSE
5324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
6324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver[The "BSD licence"]
7324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverCopyright (c) 2009-2010 Kyle Yetter
8324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverAll rights reserved.
9324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
10324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverRedistribution and use in source and binary forms, with or without
11324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruvermodification, are permitted provided that the following conditions
12324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverare met:
13324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
14324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver 1. Redistributions of source code must retain the above copyright
15324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    notice, this list of conditions and the following disclaimer.
16324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver 2. Redistributions in binary form must reproduce the above copyright
17324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    notice, this list of conditions and the following disclaimer in the
18324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    documentation and/or other materials provided with the distribution.
19324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver 3. The name of the author may not be used to endorse or promote products
20324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    derived from this software without specific prior written permission.
21324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
22324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverTHIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
23324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverIMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
24324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverOF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
25324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverIN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
26324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverINCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
27324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverNOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
28324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverDATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
29324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverTHEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
30324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
31324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverTHIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
32324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
33324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver=end
34324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
35324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruvermodule ANTLR3
36324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
37324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver=begin rdoc ANTLR3::Token
38324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
39324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverAt a minimum, tokens are data structures that bind together a chunk of text and
40324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruvera corresponding type symbol, which categorizes/characterizes the content of the
41324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruvertext. Tokens also usually carry information about their location in the input,
42324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruversuch as absolute character index, line number, and position within the line (or
43324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruvercolumn).
44324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
45324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverFurthermore, ANTLR tokens are assigned a "channel" number, an extra degree of
46324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruvercategorization that groups things on a larger scale. Parsers will usually ignore
47324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruvertokens that have channel value 99 (the HIDDEN_CHANNEL), so you can keep things
48324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverlike comment and white space huddled together with neighboring tokens,
49324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruvereffectively ignoring them without discarding them.
50324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
51324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverANTLR tokens also keep a reference to the source stream from which they
52324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruveroriginated. Token streams will also provide an index value for the token, which
53324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverindicates the position of the token relative to other tokens in the stream,
54324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverstarting at zero. For example, the 22nd token pulled from a lexer by
55324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverCommonTokenStream will have index value 21.
56324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
57324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver== Token as an Interface
58324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
59324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverThis library provides a token implementation (see CommonToken). Additionally,
60324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruveryou may write your own token class as long as you provide methods that give
61324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruveraccess to the attributes expected by a token. Even though most of the ANTLR
62324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverlibrary tries to use duck-typing techniques instead of pure object-oriented type
63324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverchecking, it's a good idea to include this ANTLR3::Token into your customized
64324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruvertoken class.
65324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
66324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver=end
67324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
68324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruvermodule Token
69324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  include ANTLR3::Constants
70324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  include Comparable
71324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
72324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  # the token's associated chunk of text
73324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  attr_accessor :text
74324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
75324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  # the integer value associated with the token's type
76324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  attr_accessor :type
77324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
78324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  # the text's starting line number within the source (indexed starting at 1)
79324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  attr_accessor :line
80324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
81324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  # the text's starting position in the line within the source (indexed starting at 0)
82324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  attr_accessor :column
83324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
84324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  # the integer value of the channel to which the token is assigned
85324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  attr_accessor :channel
86324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
87324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  # the index of the token with respect to other the other tokens produced during lexing
88324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  attr_accessor :index
89324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
90324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  # a reference to the input stream from which the token was extracted
91324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  attr_accessor :input
92324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
93324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  # the absolute character index in the input at which the text starts
94324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  attr_accessor :start
95324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
96324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  # the absolute character index in the input at which the text ends
97324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  attr_accessor :stop
98324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
99324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  alias :input_stream :input
100324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  alias :input_stream= :input=
101324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  alias :token_index :index
102324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  alias :token_index= :index=
103324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
104324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  #
105324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  # The match operator has been implemented to match against several different
106324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  # attributes of a token for convenience in quick scripts
107324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  #
108324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  # @example Match against an integer token type constant
109324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  #   token =~ VARIABLE_NAME   => true/false
110324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  # @example Match against a token type name as a Symbol
111324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  #   token =~ :FLOAT          => true/false
112324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  # @example Match the token text against a Regular Expression
113324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  #   token =~ /^@[a-z_]\w*$/i
114324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  # @example Compare the token's text to a string
115324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  #   token =~ "class"
116324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  # 
117324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def =~ obj
118324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    case obj
119324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    when Integer then type == obj
120324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    when Symbol then name == obj.to_s
121324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    when Regexp then obj =~ text
122324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    when String then text == obj
123324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    else super
124324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    end
125324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
126324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
127324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  #
128324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  # Tokens are comparable by their stream index values
129324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  # 
130324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def <=> tk2
131324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    index <=> tk2.index
132324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
133324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
134324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def initialize_copy( orig )
135324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    self.index   = -1
136324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    self.type    = orig.type
137324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    self.channel = orig.channel
138324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    self.text    = orig.text.clone if orig.text
139324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    self.start   = orig.start
140324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    self.stop    = orig.stop
141324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    self.line    = orig.line
142324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    self.column  = orig.column
143324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    self.input   = orig.input
144324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
145324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
146324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def concrete?
147324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    input && start && stop ? true : false
148324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
149324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
150324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def imaginary?
151324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    input && start && stop ? false : true
152324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
153324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
154324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def name
155324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    token_name( type )
156324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
157324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
158324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def source_name
159324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    i = input and i.source_name
160324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
161324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
162324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def hidden?
163324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    channel == HIDDEN_CHANNEL
164324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
165324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
166324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def source_text
167324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    concrete? ? input.substring( start, stop ) : text
168324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
169324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
170324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  #
171324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  # Sets the token's channel value to HIDDEN_CHANNEL
172324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  # 
173324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def hide!
174324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    self.channel = HIDDEN_CHANNEL
175324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
176324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
177324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def inspect
178324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    text_inspect    = text  ? "[#{ text.inspect }] " : ' '
179324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    text_position   = line > 0  ? "@ line #{ line } col #{ column } " : ''
180324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    stream_position = start ? "(#{ range.inspect })" : ''
181324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    
182324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    front =  index >= 0 ? "#{ index } " : ''
183324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    rep = front << name << text_inspect <<
184324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver                text_position << stream_position
185324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    rep.strip!
186324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    channel == DEFAULT_CHANNEL or rep << " (#{ channel.to_s })"
187324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    return( rep )
188324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
189324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
190324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def pretty_print( printer )
191324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    printer.text( inspect )
192324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
193324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
194324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def range
195324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    start..stop rescue nil
196324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
197324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
198324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def to_i
199324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    index.to_i
200324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
201324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
202324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def to_s
203324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    text.to_s
204324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
205324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
206324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverprivate
207324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
208324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def token_name( type )
209324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    BUILT_IN_TOKEN_NAMES[ type ]
210324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
211324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverend
212324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
213324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverCommonToken = Struct.new( :type, :channel, :text, :input, :start,
214324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver                         :stop, :index, :line, :column )
215324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
216324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver=begin rdoc ANTLR3::CommonToken
217324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
218324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverThe base class for the standard implementation of Token. It is implemented as a
219324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruversimple Struct as tokens are basically simple data structures binding together a
220324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverbunch of different information and Structs are slightly faster than a standard
221324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverObject with accessor methods implementation.
222324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
223324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverBy default, ANTLR generated ruby code will provide a customized subclass of
224324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverCommonToken to track token-type names efficiently for debugging, inspection, and
225324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruvergeneral utility. Thus code generated for a standard combo lexer-parser grammar
226324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruvernamed XYZ will have a base module named XYZ and a customized CommonToken
227324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruversubclass named XYZ::Token.
228324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
229324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverHere is the token structure attribute list in order:
230324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
231324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver* <tt>type</tt>
232324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver* <tt>channel</tt>
233324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver* <tt>text</tt>
234324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver* <tt>input</tt>
235324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver* <tt>start</tt>
236324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver* <tt>stop</tt>
237324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver* <tt>index</tt>
238324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver* <tt>line</tt>
239324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver* <tt>column</tt>
240324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
241324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver=end
242324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
243324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverclass CommonToken
244324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  include Token
245324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  DEFAULT_VALUES = { 
246324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    :channel => DEFAULT_CHANNEL,
247324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    :index   => -1,
248324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    :line    =>  0,
249324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    :column  => -1
250324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  }.freeze
251324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
252324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def self.token_name( type )
253324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    BUILT_IN_TOKEN_NAMES[ type ]
254324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
255324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
256324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def self.create( fields = {} )
257324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    fields = DEFAULT_VALUES.merge( fields )
258324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    args = members.map { |name| fields[ name.to_sym ] }
259324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    new( *args )
260324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
261324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
262324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  # allows you to make a copy of a token with a different class
263324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def self.from_token( token )
264324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    new( 
265324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      token.type,  token.channel, token.text ? token.text.clone : nil,
266324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      token.input, token.start,   token.stop, -1, token.line, token.column
267324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    )
268324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
269324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
270324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def initialize( type = nil, channel = DEFAULT_CHANNEL, text = nil,
271324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver                 input = nil, start = nil, stop = nil, index = -1,
272324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver                 line = 0, column = -1 )
273324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    super
274324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    block_given? and yield( self )
275324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    self.text.nil? && self.start && self.stop and
276324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      self.text = self.input.substring( self.start, self.stop )
277324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
278324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
279324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  alias :input_stream :input
280324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  alias :input_stream= :input=
281324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  alias :token_index :index
282324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  alias :token_index= :index=
283324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverend
284324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
285324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruvermodule Constants
286324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
287324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  # End of File / End of Input character and token type
288324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  EOF_TOKEN = CommonToken.new( EOF ).freeze
289324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  INVALID_TOKEN = CommonToken.new( INVALID_TOKEN_TYPE ).freeze
290324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  SKIP_TOKEN = CommonToken.new( INVALID_TOKEN_TYPE ).freeze  
291324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverend
292324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
293324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
294324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
295324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver=begin rdoc ANTLR3::TokenSource
296324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
297324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverTokenSource is a simple mixin module that demands an
298324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverimplementation of the method #next_token. In return, it
299324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverdefines methods #next and #each, which provide basic
300324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruveriterator methods for token generators. Furthermore, it
301324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverincludes Enumerable to provide the standard Ruby iteration
302324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruvermethods to token generators, like lexers.
303324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
304324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver=end
305324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
306324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruvermodule TokenSource
307324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  include Constants
308324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  include Enumerable
309324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  extend ClassMacros
310324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
311324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  abstract :next_token
312324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
313324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def next
314324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    token = next_token()
315324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    raise StopIteration if token.nil? || token.type == EOF
316324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    return token
317324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
318324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
319324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def each
320324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    block_given? or return enum_for( :each )
321324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    while token = next_token and token.type != EOF
322324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      yield( token )
323324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    end
324324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    return self
325324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
326324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
327324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def to_stream( options = {} )
328324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    if block_given?
329324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      CommonTokenStream.new( self, options ) { | t, stream | yield( t, stream ) }
330324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    else
331324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      CommonTokenStream.new( self, options )
332324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    end
333324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
334324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverend
335324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
336324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
337324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver=begin rdoc ANTLR3::TokenFactory
338324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
339324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverThere are a variety of different entities throughout the ANTLR runtime library
340324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverthat need to create token objects This module serves as a mixin that provides
341324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruvermethods for constructing tokens.
342324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
343324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverIncluding this module provides a +token_class+ attribute. Instance of the
344324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverincluding class can create tokens using the token class (which defaults to
345324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverANTLR3::CommonToken). Token classes are presumed to have an #initialize method
346324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverthat can be called without any parameters and the token objects are expected to
347324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverhave the standard token attributes (see ANTLR3::Token).
348324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
349324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver=end
350324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
351324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruvermodule TokenFactory
352324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  attr_writer :token_class
353324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def token_class
354324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    @token_class ||= begin
355324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      self.class.token_class rescue
356324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      self::Token rescue
357324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      ANTLR3::CommonToken
358324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    end
359324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
360324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
361324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def create_token( *args )
362324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    if block_given?
363324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      token_class.new( *args ) do |*targs|
364324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver        yield( *targs )
365324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      end
366324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    else
367324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      token_class.new( *args )
368324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    end
369324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
370324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverend
371324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
372324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
373324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver=begin rdoc ANTLR3::TokenScheme
374324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
375324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverTokenSchemes exist to handle the problem of defining token types as integer
376324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruvervalues while maintaining meaningful text names for the types. They are
377324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverdynamically defined modules that map integer values to constants with token-type
378324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruvernames.
379324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
380324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver---
381324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
382324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverFundamentally, tokens exist to take a chunk of text and identify it as belonging
383324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverto some category, like "VARIABLE" or "INTEGER". In code, the category is
384324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverrepresented by an integer -- some arbitrary value that ANTLR will decide to use
385324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruveras it is creating the recognizer. The purpose of using an integer (instead of
386324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruversay, a ruby symbol) is that ANTLR's decision logic often needs to test whether a
387324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruvertoken's type falls within a range, which is not possible with symbols.
388324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
389324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverThe downside of token types being represented as integers is that a developer
390324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverneeds to be able to reference the unknown type value by name in action code.
391324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverFurthermore, code that references the type by name and tokens that can be
392324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverinspected with names in place of type values are more meaningful to a developer.
393324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
394324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverSince ANTLR requires token type names to follow capital-letter naming
395324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverconventions, defining types as named constants of the recognizer class resolves
396324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverthe problem of referencing type values by name. Thus, a token type like
397324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver``VARIABLE'' can be represented by a number like 5 and referenced within code by
398324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver+VARIABLE+. However, when a recognizer creates tokens, the name of the token's
399324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruvertype cannot be seen without using the data defined in the recognizer.
400324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
401324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverOf course, tokens could be defined with a name attribute that could be specified
402324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverwhen tokens are created. However, doing so would make tokens take up more space
403324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverthan necessary, as well as making it difficult to change the type of a token
404324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverwhile maintaining a correct name value.
405324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
406324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverTokenSchemes exist as a technique to manage token type referencing and name
407324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverextraction. They:
408324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
409324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver1. keep token type references clear and understandable in recognizer code
410324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver2. permit access to a token's type-name independently of recognizer objects
411324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver3. allow multiple classes to share the same token information
412324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
413324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver== Building Token Schemes
414324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
415324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverTokenScheme is a subclass of Module. Thus, it has the method
416324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver<tt>TokenScheme.new(tk_class = nil) { ... module-level code ...}</tt>, which
417324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverwill evaluate the block in the context of the scheme (module), similarly to
418324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverModule#module_eval. Before evaluating the block, <tt>.new</tt> will setup the
419324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruvermodule with the following actions:
420324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
421324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver1. define a customized token class (more on that below)
422324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver2. add a new constant, TOKEN_NAMES, which is a hash that maps types to names
423324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver3. dynamically populate the new scheme module with a couple instance methods
424324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver4. include ANTLR3::Constants in the new scheme module
425324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
426324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverAs TokenScheme the class functions as a metaclass, figuring out some of the
427324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverscoping behavior can be mildly confusing if you're trying to get a handle of the
428324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverentity for your own purposes. Remember that all of the instance methods of
429324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverTokenScheme function as module-level methods of TokenScheme instances, ala
430324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver+attr_accessor+ and friends.
431324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
432324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver<tt>TokenScheme#define_token(name_symbol, int_value)</tt> adds a constant
433324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverdefinition <tt>name_symbol</tt> with the value <tt>int_value</tt>. It is
434324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruveressentially like <tt>Module#const_set</tt>, except it forbids constant
435324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruveroverwriting (which would mess up recognizer code fairly badly) and adds an
436324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverinverse type-to-name map to its own <tt>TOKEN_NAMES</tt> table.
437324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver<tt>TokenScheme#define_tokens</tt> is a convenience method for defining many
438324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruvertypes with a hash pairing names to values.
439324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
440324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver<tt>TokenScheme#register_name(value, name_string)</tt> specifies a custom
441324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruvertype-to-name definition. This is particularly useful for the anonymous tokens
442324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverthat ANTLR generates for literal strings in the grammar specification. For
443324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverexample, if you refer to the literal <tt>'='</tt> in some parser rule in your
444324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruvergrammar, ANTLR will add a lexer rule for the literal and give the token a name
445324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverlike <tt>T__<i>x</i></tt>, where <tt><i>x</i></tt> is the type's integer value.
446324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverSince this is pretty meaningless to a developer, generated code should add a
447324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverspecial name definition for type value <tt><i>x</i></tt> with the string
448324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver<tt>"'='"</tt>.
449324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
450324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver=== Sample TokenScheme Construction
451324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
452324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  TokenData = ANTLR3::TokenScheme.new do
453324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    define_tokens(
454324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      :INT  => 4,
455324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      :ID   => 6,
456324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      :T__5 => 5,
457324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      :WS   => 7
458324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    )
459324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    
460324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    # note the self:: scoping below is due to the fact that
461324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    # ruby lexically-scopes constant names instead of
462324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    # looking up in the current scope
463324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    register_name(self::T__5, "'='")
464324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
465324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
466324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  TokenData::ID           # => 6
467324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  TokenData::T__5         # => 5
468324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  TokenData.token_name(4) # => 'INT'
469324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  TokenData.token_name(5) # => "'='"
470324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
471324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  class ARecognizerOrSuch < ANTLR3::Parser
472324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    include TokenData
473324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    ID   # => 6
474324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
475324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
476324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver== Custom Token Classes and Relationship with Tokens
477324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
478324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverWhen a TokenScheme is created, it will define a subclass of ANTLR3::CommonToken
479324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverand assigned it to the constant name +Token+. This token class will both include
480324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverand extend the scheme module. Since token schemes define the private instance
481324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruvermethod <tt>token_name(type)</tt>, instances of the token class are now able to
482324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverprovide their type names. The Token method <tt>name</tt> uses the
483324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver<tt>token_name</tt> method to provide the type name as if it were a simple
484324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverattribute without storing the name itself.
485324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
486324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverWhen a TokenScheme is included in a recognizer class, the class will now have
487324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverthe token types as named constants, a type-to-name map constant +TOKEN_NAMES+,
488324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverand a grammar-specific subclass of ANTLR3::CommonToken assigned to the constant
489324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverToken. Thus, when recognizers need to manufacture tokens, instead of using the
490324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruvergeneric CommonToken class, they can create tokens using the customized Token
491324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverclass provided by the token scheme.
492324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
493324c4644fee44b9898524c09511bd33c3f12e2dfBen GruverIf you need to use a token class other than CommonToken, you can pass the class
494324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruveras a parameter to TokenScheme.new, which will be used in place of the
495324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverdynamically-created CommonToken subclass.
496324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
497324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver=end
498324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
499324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverclass TokenScheme < ::Module
500324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  include TokenFactory
501324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
502324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def self.new( tk_class = nil, &body )
503324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    super() do
504324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      tk_class ||= Class.new( ::ANTLR3::CommonToken )
505324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      self.token_class = tk_class
506324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      
507324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      const_set( :TOKEN_NAMES, ::ANTLR3::Constants::BUILT_IN_TOKEN_NAMES.clone )
508324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      
509324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      @types  = ::ANTLR3::Constants::BUILT_IN_TOKEN_NAMES.invert
510324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      @unused = ::ANTLR3::Constants::MIN_TOKEN_TYPE
511324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      
512324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      scheme = self
513324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      define_method( :token_scheme ) { scheme }
514324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      define_method( :token_names )  { scheme::TOKEN_NAMES }
515324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      define_method( :token_name ) do |type|
516324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver        begin
517324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver          token_names[ type ] or super
518324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver        rescue NoMethodError
519324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver          ::ANTLR3::CommonToken.token_name( type )
520324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver        end
521324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      end
522324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      module_function :token_name, :token_names
523324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      
524324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      include ANTLR3::Constants
525324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      
526324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      body and module_eval( &body )
527324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    end
528324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
529324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
530324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def self.build( *token_names )
531324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    token_names = [ token_names ].flatten!
532324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    token_names.compact!
533324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    token_names.uniq!
534324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    tk_class = Class === token_names.first ? token_names.shift : nil
535324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    value_maps, names = token_names.partition { |i| Hash === i }
536324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    new( tk_class ) do
537324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      for value_map in value_maps
538324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver        define_tokens( value_map )
539324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      end
540324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      
541324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      for name in names
542324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver        define_token( name )
543324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      end
544324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    end
545324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
546324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
547324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
548324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def included( mod )
549324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    super
550324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    mod.extend( self )
551324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
552324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  private :included
553324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
554324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  attr_reader :unused, :types
555324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
556324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def define_tokens( token_map = {} )
557324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    for token_name, token_value in token_map
558324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      define_token( token_name, token_value )
559324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    end
560324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    return self
561324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
562324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
563324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def define_token( name, value = nil )
564324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    name = name.to_s
565324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    
566324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    if current_value = @types[ name ]
567324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      # token type has already been defined
568324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      # raise an error unless value is the same as the current value
569324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      value ||= current_value
570324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      unless current_value == value
571324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver        raise NameError.new( 
572324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver          "new token type definition ``#{ name } = #{ value }'' conflicts " <<
573324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver          "with existing type definition ``#{ name } = #{ current_value }''", name
574324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver        )
575324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      end
576324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    else
577324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      value ||= @unused
578324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      if name =~ /^[A-Z]\w*$/
579324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver        const_set( name, @types[ name ] = value )
580324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      else
581324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver        constant = "T__#{ value }"
582324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver        const_set( constant, @types[ constant ] = value )
583324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver        @types[ name ] = value
584324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      end
585324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      register_name( value, name ) unless built_in_type?( value )
586324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    end
587324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    
588324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    value >= @unused and @unused = value + 1
589324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    return self
590324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
591324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
592324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def register_names( *names )
593324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    if names.length == 1 and Hash === names.first
594324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      names.first.each do |value, name|
595324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver        register_name( value, name )
596324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      end
597324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    else
598324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      names.each_with_index do |name, i|
599324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver        type_value = Constants::MIN_TOKEN_TYPE + i
600324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver        register_name( type_value, name )
601324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      end
602324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    end
603324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
604324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
605324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def register_name( type_value, name )
606324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    name = name.to_s.freeze
607324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    if token_names.has_key?( type_value )
608324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      current_name = token_names[ type_value ]
609324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      current_name == name and return name
610324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      
611324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      if current_name == "T__#{ type_value }"
612324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver        # only an anonymous name is registered -- upgrade the name to the full literal name
613324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver        token_names[ type_value ] = name
614324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      elsif name == "T__#{ type_value }"
615324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver        # ignore name downgrade from literal to anonymous constant
616324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver        return current_name
617324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      else
618324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver        error = NameError.new( 
619324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver          "attempted assignment of token type #{ type_value }" <<
620324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver          " to name #{ name } conflicts with existing name #{ current_name }", name
621324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver        )
622324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver        raise error
623324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      end
624324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    else
625324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      token_names[ type_value ] = name.to_s.freeze
626324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    end
627324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
628324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
629324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def built_in_type?( type_value )
630324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    Constants::BUILT_IN_TOKEN_NAMES.fetch( type_value, false ) and true
631324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
632324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
633324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def token_defined?( name_or_value )
634324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    case value
635324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    when Integer then token_names.has_key?( name_or_value )
636324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    else const_defined?( name_or_value.to_s )
637324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    end
638324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
639324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
640324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def []( name_or_value )
641324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    case name_or_value
642324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    when Integer then token_names.fetch( name_or_value, nil )
643324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    else const_get( name_or_value.to_s ) rescue token_names.index( name_or_value )
644324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    end
645324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
646324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
647324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def token_class
648324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    self::Token
649324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
650324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
651324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  def token_class=( klass )
652324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    Class === klass or raise( TypeError, "token_class must be a Class" )
653324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    Util.silence_warnings do
654324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      klass < self or klass.send( :include, self )
655324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver      const_set( :Token, klass )
656324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver    end
657324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  end
658324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver  
659324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverend
660324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruver
661324c4644fee44b9898524c09511bd33c3f12e2dfBen Gruverend
662