1#!/usr/bin/ruby
2# encoding: utf-8
3
4=begin LICENSE
5
6[The "BSD licence"]
7Copyright (c) 2009-2010 Kyle Yetter
8All rights reserved.
9
10Redistribution and use in source and binary forms, with or without
11modification, are permitted provided that the following conditions
12are met:
13
14 1. Redistributions of source code must retain the above copyright
15    notice, this list of conditions and the following disclaimer.
16 2. Redistributions in binary form must reproduce the above copyright
17    notice, this list of conditions and the following disclaimer in the
18    documentation and/or other materials provided with the distribution.
19 3. The name of the author may not be used to endorse or promote products
20    derived from this software without specific prior written permission.
21
22THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
23IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
24OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
25IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
26INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
27NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
28DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
29THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
30(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
31THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
32
33=end
34
35module ANTLR3
36
37
38=begin rdoc ANTLR3::Stream
39
40= ANTLR3 Streams
41
42This documentation first covers the general concept of streams as used by ANTLR
43recognizers, and then discusses the specific <tt>ANTLR3::Stream</tt> module.
44
45== ANTLR Stream Classes
46
47ANTLR recognizers need a way to walk through input data in a serialized IO-style
48fashion. They also need some book-keeping about the input to provide useful
49information to developers, such as current line number and column. Furthermore,
50to implement backtracking and various error recovery techniques, recognizers
51need a way to record various locations in the input at a number of points in the
52recognition process so the input state may be restored back to a prior state.
53
54ANTLR bundles all of this functionality into a number of Stream classes, each
55designed to be used by recognizers for a specific recognition task. Most of the
56Stream hierarchy is implemented in antlr3/stream.rb, which is loaded by default
57when 'antlr3' is required.
58
59---
60
61Here's a brief overview of the various stream classes and their respective
62purpose:
63
64StringStream::
65  Similar to StringIO from the standard Ruby library, StringStream wraps raw
66  String data in a Stream interface for use by ANTLR lexers.
67FileStream::
68  A subclass of StringStream, FileStream simply wraps data read from an IO or
69  File object for use by lexers.
70CommonTokenStream::
71  The job of a TokenStream is to read lexer output and then provide ANTLR
72  parsers with the means to sequential walk through series of tokens.
73  CommonTokenStream is the default TokenStream implementation.
74TokenRewriteStream::
75  A subclass of CommonTokenStream, TokenRewriteStreams provide rewriting-parsers
76  the ability to produce new output text from an input token-sequence by
77  managing rewrite "programs" on top of the stream.
78CommonTreeNodeStream::
79  In a similar fashion to CommonTokenStream, CommonTreeNodeStream feeds tokens
80  to recognizers in a sequential fashion. However, the stream object serializes
81  an Abstract Syntax Tree into a flat, one-dimensional sequence, but preserves
82  the two-dimensional shape of the tree using special UP and DOWN tokens. The
83  sequence is primarily used by ANTLR Tree Parsers. *note* -- this is not
84  defined in antlr3/stream.rb, but antlr3/tree.rb
85
86---
87
88The next few sections cover the most significant methods of all stream classes. 
89
90=== consume / look / peek
91
92<tt>stream.consume</tt> is used to advance a stream one unit. StringStreams are
93advanced by one character and TokenStreams are advanced by one token.
94
95<tt>stream.peek(k = 1)</tt> is used to quickly retrieve the object of interest
96to a recognizer at look-ahead position specified by <tt>k</tt>. For
97<b>StringStreams</b>, this is the <i>integer value of the character</i>
98<tt>k</tt> characters ahead of the stream cursor. For <b>TokenStreams</b>, this
99is the <i>integer token type of the token</i> <tt>k</tt> tokens ahead of the
100stream cursor.
101
102<tt>stream.look(k = 1)</tt> is used to retrieve the full object of interest at
103look-ahead position specified by <tt>k</tt>. While <tt>peek</tt> provides the
104<i>bare-minimum lightweight information</i> that the recognizer needs,
105<tt>look</tt> provides the <i>full object of concern</i> in the stream. For
106<b>StringStreams</b>, this is a <i>string object containing the single
107character</i> <tt>k</tt> characters ahead of the stream cursor. For
108<b>TokenStreams</b>, this is the <i>full token structure</i> <tt>k</tt> tokens
109ahead of the stream cursor.
110
111<b>Note:</b> in most ANTLR runtime APIs for other languages, <tt>peek</tt> is
112implemented by some method with a name like <tt>LA(k)</tt> and <tt>look</tt> is
113implemented by some method with a name like <tt>LT(k)</tt>. When writing this
114Ruby runtime API, I found this naming practice both confusing, ambiguous, and
115un-Ruby-like. Thus, I chose <tt>peek</tt> and <tt>look</tt> to represent a
116quick-look (peek) and a full-fledged look-ahead operation (look). If this causes
117confusion or any sort of compatibility strife for developers using this
118implementation, all apologies.
119
120=== mark / rewind / release
121
122<tt>marker = stream.mark</tt> causes the stream to record important information
123about the current stream state, place the data in an internal memory table, and
124return a memento, <tt>marker</tt>. The marker object is typically an integer key
125to the stream's internal memory table.
126
127Used in tandem with, <tt>stream.rewind(mark = last_marker)</tt>, the marker can
128be used to restore the stream to an earlier state. This is used by recognizers
129to perform tasks such as backtracking and error recovery.
130
131<tt>stream.release(marker = last_marker)</tt> can be used to release an existing
132state marker from the memory table.
133
134=== seek
135
136<tt>stream.seek(position)</tt> moves the stream cursor to an absolute position
137within the stream, basically like typical ruby <tt>IO#seek</tt> style methods.
138However, unlike <tt>IO#seek</tt>, ANTLR streams currently always use absolute
139position seeking.
140
141== The Stream Module
142
143<tt>ANTLR3::Stream</tt> is an abstract-ish base mixin for all IO-like stream
144classes used by ANTLR recognizers.
145
146The module doesn't do much on its own besides define arguably annoying
147``abstract'' pseudo-methods that demand implementation when it is mixed in to a
148class that wants to be a Stream. Right now this exists as an artifact of porting
149the ANTLR Java/Python runtime library to Ruby. In Java, of course, this is
150represented as an interface. In Ruby, however, objects are duck-typed and
151interfaces aren't that useful as programmatic entities -- in fact, it's mildly
152wasteful to have a module like this hanging out. Thus, I may axe it.
153
154When mixed in, it does give the class a #size and #source_name attribute
155methods.
156
157Except in a small handful of places, most of the ANTLR runtime library uses
158duck-typing and not type checking on objects. This means that the methods which
159manipulate stream objects don't usually bother checking that the object is a
160Stream and assume that the object implements the proper stream interface. Thus,
161it is not strictly necessary that custom stream objects include ANTLR3::Stream,
162though it isn't a bad idea.
163
164=end
165
166module Stream
167  include ANTLR3::Constants
168  extend ClassMacros
169  
170  ##
171  # :method: consume
172  # used to advance a stream one unit (such as character or token)
173  abstract :consume
174  
175  ##
176  # :method: peek( k = 1 )
177  # used to quickly retreive the object of interest to a recognizer at lookahead
178  # position specified by <tt>k</tt> (such as integer value of a character or an
179  # integer token type)
180  abstract :peek
181  
182  ##
183  # :method: look( k = 1 )
184  # used to retreive the full object of interest at lookahead position specified
185  # by <tt>k</tt> (such as a character string or a token structure)
186  abstract :look
187  
188  ##
189  # :method: mark
190  # saves the current position for the purposes of backtracking and
191  # returns a value to pass to #rewind at a later time
192  abstract :mark
193  
194  ##
195  # :method: index
196  # returns the current position of the stream
197  abstract :index
198  
199  ##
200  # :method: rewind( marker = last_marker )
201  # restores the stream position using the state information previously saved
202  # by the given marker
203  abstract :rewind
204  
205  ##
206  # :method: release( marker = last_marker )
207  # clears the saved state information associated with the given marker value
208  abstract :release
209  
210  ##
211  # :method: seek( position )
212  # move the stream to the given absolute index given by +position+
213  abstract :seek
214  
215  ##
216  # the total number of symbols in the stream
217  attr_reader :size
218  
219  ##
220  # indicates an identifying name for the stream -- usually the file path of the input
221  attr_accessor :source_name
222end
223
224=begin rdoc ANTLR3::CharacterStream
225
226CharacterStream further extends the abstract-ish base mixin Stream to add
227methods specific to navigating character-based input data. Thus, it serves as an
228immitation of the Java interface for text-based streams, which are primarily
229used by lexers.
230
231It adds the ``abstract'' method, <tt>substring(start, stop)</tt>, which must be
232implemented to return a slice of the input string from position <tt>start</tt>
233to position <tt>stop</tt>. It also adds attribute accessor methods <tt>line</tt>
234and <tt>column</tt>, which are expected to indicate the current line number and
235position within the current line, respectively.
236
237== A Word About <tt>line</tt> and <tt>column</tt> attributes
238
239Presumably, the concept of <tt>line</tt> and <tt>column</tt> attirbutes of text
240are familliar to most developers. Line numbers of text are indexed from number 1
241up (not 0). Column numbers are indexed from 0 up. Thus, examining sample text:
242
243  Hey this is the first line.
244  Oh, and this is the second line.
245
246Line 1 is the string "Hey this is the first line\\n". If a character stream is at
247line 2, character 0, the stream cursor is sitting between the characters "\\n"
248and "O".
249
250*Note:* most ANTLR runtime APIs for other languages refer to <tt>column</tt>
251with the more-precise, but lengthy name <tt>charPositionInLine</tt>. I prefered
252to keep it simple and familliar in this Ruby runtime API.
253
254=end
255
256module CharacterStream
257  include Stream
258  extend ClassMacros
259  include Constants
260  
261  ##
262  # :method: substring(start,stop)
263  abstract :substring
264  
265  attr_accessor :line
266  attr_accessor :column
267end
268
269
270=begin rdoc ANTLR3::TokenStream
271
272TokenStream further extends the abstract-ish base mixin Stream to add methods
273specific to navigating token sequences. Thus, it serves as an imitation of the
274Java interface for token-based streams, which are used by many different
275components in ANTLR, including parsers and tree parsers.
276
277== Token Streams
278
279Token streams wrap a sequence of token objects produced by some token source,
280usually a lexer. They provide the operations required by higher-level
281recognizers, such as parsers and tree parsers for navigating through the
282sequence of tokens. Unlike simple character-based streams, such as StringStream,
283token-based streams have an additional level of complexity because they must
284manage the task of "tuning" to a specific token channel.
285
286One of the main advantages of ANTLR-based recognition is the token
287<i>channel</i> feature, which allows you to hold on to all tokens of interest
288while only presenting a specific set of interesting tokens to a parser. For
289example, if you need to hide whitespace and comments from a parser, but hang on
290to them for some other purpose, you have the lexer assign the comments and
291whitespace to channel value HIDDEN as it creates the tokens.
292
293When you create a token stream, you can tune it to some specific channel value.
294Then, all <tt>peek</tt>, <tt>look</tt>, and <tt>consume</tt> operations only
295yield tokens that have the same value for <tt>channel</tt>. The stream skips
296over any non-matching tokens in between.
297
298== The TokenStream Interface
299
300In addition to the abstract methods and attribute methods provided by the base
301Stream module, TokenStream adds a number of additional method implementation
302requirements and attributes.
303
304=end
305
306module TokenStream
307  include Stream
308  extend ClassMacros
309  
310  ##
311  # expected to return the token source object (such as a lexer) from which
312  # all tokens in the stream were retreived
313  attr_reader :token_source
314  
315  ##
316  # expected to return the value of the last marker produced by a call to 
317  # <tt>stream.mark</tt>
318  attr_reader :last_marker
319  
320  ##
321  # expected to return the integer index of the stream cursor
322  attr_reader :position
323  
324  ##
325  # the integer channel value to which the stream is ``tuned''
326  attr_accessor :channel
327  
328  ##
329  # :method: to_s(start=0,stop=tokens.length-1)
330  # should take the tokens between start and stop in the sequence, extract their text
331  # and return the concatenation of all the text chunks
332  abstract :to_s
333  
334  ##
335  # :method: at( i )
336  # return the stream symbol at index +i+
337  abstract :at
338end
339
340=begin rdoc ANTLR3::StringStream
341
342A StringStream's purpose is to wrap the basic, naked text input of a recognition
343system. Like all other stream types, it provides serial navigation of the input;
344a recognizer can arbitrarily step forward and backward through the stream's
345symbols as it requires. StringStream and its subclasses are they main way to
346feed text input into an ANTLR Lexer for token processing.
347
348The stream's symbols of interest, of course, are character values. Thus, the
349#peek method returns the integer character value at look-ahead position
350<tt>k</tt> and the #look method returns the character value as a +String+. They
351also track various pieces of information such as the line and column numbers at
352the current position.
353
354=== Note About Text Encoding
355
356This version of the runtime library primarily targets ruby version 1.8, which
357does not have strong built-in support for multi-byte character encodings. Thus,
358characters are assumed to be represented by a single byte -- an integer between
3590 and 255. Ruby 1.9 does provide built-in encoding support for multi-byte
360characters, but currently this library does not provide any streams to handle
361non-ASCII encoding. However, encoding-savvy recognition code is a future
362development goal for this project.
363
364=end
365
366class StringStream
367  NEWLINE = ?\n.ord
368  
369  include CharacterStream
370  
371  # current integer character index of the stream
372  attr_reader :position
373  
374  # the current line number of the input, indexed upward from 1
375  attr_reader :line
376  
377  # the current character position within the current line, indexed upward from 0
378  attr_reader :column
379  
380  # the name associated with the stream -- usually a file name
381  # defaults to <tt>"(string)"</tt>
382  attr_accessor :name
383  
384  # the entire string that is wrapped by the stream
385  attr_reader :data
386  attr_reader :string
387  
388  if RUBY_VERSION =~ /^1\.9/
389    
390    # creates a new StringStream object where +data+ is the string data to stream.
391    # accepts the following options in a symbol-to-value hash:
392    #
393    # [:file or :name] the (file) name to associate with the stream; default: <tt>'(string)'</tt>
394    # [:line] the initial line number; default: +1+
395    # [:column] the initial column number; default: +0+
396    # 
397    def initialize( data, options = {} )      # for 1.9
398      @string   = data.to_s.encode( Encoding::UTF_8 ).freeze
399      @data     = @string.codepoints.to_a.freeze
400      @position = options.fetch :position, 0
401      @line     = options.fetch :line, 1
402      @column   = options.fetch :column, 0
403      @markers  = []
404      @name   ||= options[ :file ] || options[ :name ] # || '(string)'
405      mark
406    end
407    
408    #
409    # identical to #peek, except it returns the character value as a String
410    # 
411    def look( k = 1 )               # for 1.9
412      k == 0 and return nil
413      k += 1 if k < 0
414      
415      index = @position + k - 1
416      index < 0 and return nil
417      
418      @string[ index ]
419    end
420    
421  else
422    
423    # creates a new StringStream object where +data+ is the string data to stream.
424    # accepts the following options in a symbol-to-value hash:
425    #
426    # [:file or :name] the (file) name to associate with the stream; default: <tt>'(string)'</tt>
427    # [:line] the initial line number; default: +1+
428    # [:column] the initial column number; default: +0+
429    # 
430    def initialize( data, options = {} )    # for 1.8
431      @data = data.to_s
432      @data.equal?( data ) and @data = @data.clone
433      @data.freeze
434      @string = @data
435      @position = options.fetch :position, 0
436      @line = options.fetch :line, 1
437      @column = options.fetch :column, 0
438      @markers = []
439      @name ||= options[ :file ] || options[ :name ] # || '(string)'
440      mark
441    end
442    
443    #
444    # identical to #peek, except it returns the character value as a String
445    # 
446    def look( k = 1 )                        # for 1.8
447      k == 0 and return nil
448      k += 1 if k < 0
449      
450      index = @position + k - 1
451      index < 0 and return nil
452      
453      c = @data[ index ] and c.chr
454    end
455    
456  end
457  
458  def size
459    @data.length
460  end
461  
462  alias length size
463  
464  # 
465  # rewinds the stream back to the start and clears out any existing marker entries
466  # 
467  def reset
468    initial_location = @markers.first
469    @position, @line, @column = initial_location
470    @markers.clear
471    @markers << initial_location
472    return self
473  end
474  
475  #
476  # advance the stream by one character; returns the character consumed
477  # 
478  def consume
479    c = @data[ @position ] || EOF
480    if @position < @data.length
481      @column += 1
482      if c == NEWLINE
483        @line += 1
484        @column = 0
485      end
486      @position += 1
487    end
488    return( c )
489  end
490  
491  #
492  # return the character at look-ahead distance +k+ as an integer. <tt>k = 1</tt> represents
493  # the current character. +k+ greater than 1 represents upcoming characters. A negative
494  # value of +k+ returns previous characters consumed, where <tt>k = -1</tt> is the last
495  # character consumed. <tt>k = 0</tt> has undefined behavior and returns +nil+
496  # 
497  def peek( k = 1 )
498    k == 0 and return nil
499    k += 1 if k < 0
500    index = @position + k - 1
501    index < 0 and return nil
502    @data[ index ] or EOF
503  end
504  
505  #
506  # return a substring around the stream cursor at a distance +k+
507  # if <tt>k >= 0</tt>, return the next k characters
508  # if <tt>k < 0</tt>, return the previous <tt>|k|</tt> characters
509  # 
510  def through( k )
511    if k >= 0 then @string[ @position, k ] else
512      start = ( @position + k ).at_least( 0 ) # start cannot be negative or index will wrap around
513      @string[ start ... @position ]
514    end
515  end
516  
517  # operator style look-ahead
518  alias >> look
519  
520  # operator style look-behind
521  def <<( k )
522    self << -k
523  end
524  
525  alias index position
526  alias character_index position
527  
528  alias source_name name
529  
530  #
531  # Returns true if the stream appears to be at the beginning of a new line.
532  # This is an extra utility method for use inside lexer actions if needed.
533  # 
534  def beginning_of_line?
535    @position.zero? or @data[ @position - 1 ] == NEWLINE
536  end
537  
538  #
539  # Returns true if the stream appears to be at the end of a new line.
540  # This is an extra utility method for use inside lexer actions if needed.
541  # 
542  def end_of_line?
543    @data[ @position ] == NEWLINE #if @position < @data.length
544  end
545  
546  #
547  # Returns true if the stream has been exhausted.
548  # This is an extra utility method for use inside lexer actions if needed.
549  # 
550  def end_of_string?
551    @position >= @data.length
552  end
553
554  #
555  # Returns true if the stream appears to be at the beginning of a stream (position = 0).
556  # This is an extra utility method for use inside lexer actions if needed.
557  # 
558  def beginning_of_string?
559    @position == 0
560  end
561  
562  alias eof? end_of_string?
563  alias bof? beginning_of_string?
564  
565  #
566  # record the current stream location parameters in the stream's marker table and
567  # return an integer-valued bookmark that may be used to restore the stream's
568  # position with the #rewind method. This method is used to implement backtracking.
569  # 
570  def mark
571    state = [ @position, @line, @column ].freeze
572    @markers << state
573    return @markers.length - 1
574  end
575  
576  #
577  # restore the stream to an earlier location recorded by #mark. If no marker value is
578  # provided, the last marker generated by #mark will be used.
579  # 
580  def rewind( marker = @markers.length - 1, release = true )
581    ( marker >= 0 and location = @markers[ marker ] ) or return( self )
582    @position, @line, @column = location
583    release( marker ) if release
584    return self
585  end
586  
587  #
588  # the total number of markers currently in existence
589  # 
590  def mark_depth
591    @markers.length
592  end
593  
594  #
595  # the last marker value created by a call to #mark
596  # 
597  def last_marker
598    @markers.length - 1
599  end
600  
601  #
602  # let go of the bookmark data for the marker and all marker
603  # values created after the marker.
604  # 
605  def release( marker = @markers.length - 1 )
606    marker.between?( 1, @markers.length - 1 ) or return
607    @markers.pop( @markers.length - marker )
608    return self
609  end
610  
611  #
612  # jump to the absolute position value given by +index+.
613  # note: if +index+ is before the current position, the +line+ and +column+
614  #       attributes of the stream will probably be incorrect
615  # 
616  def seek( index )
617    index = index.bound( 0, @data.length )  # ensures index is within the stream's range
618    if index > @position
619      skipped = through( index - @position )
620      if lc = skipped.count( "\n" ) and lc.zero?
621        @column += skipped.length
622      else
623        @line += lc
624        @column = skipped.length - skipped.rindex( "\n" ) - 1
625      end
626    end
627    @position = index
628    return nil
629  end
630  
631  # 
632  # customized object inspection that shows:
633  # * the stream class
634  # * the stream's location in <tt>index / line:column</tt> format
635  # * +before_chars+ characters before the cursor (6 characters by default)
636  # * +after_chars+ characters after the cursor (10 characters by default)
637  # 
638  def inspect( before_chars = 6, after_chars = 10 )
639    before = through( -before_chars ).inspect
640    @position - before_chars > 0 and before.insert( 0, '... ' )
641    
642    after = through( after_chars ).inspect
643    @position + after_chars + 1 < @data.length and after << ' ...'
644    
645    location = "#@position / line #@line:#@column"
646    "#<#{ self.class }: #{ before } | #{ after } @ #{ location }>"
647  end
648  
649  #
650  # return the string slice between position +start+ and +stop+
651  # 
652  def substring( start, stop )
653    @string[ start, stop - start + 1 ]
654  end
655  
656  #
657  # identical to String#[]
658  # 
659  def []( start, *args )
660    @string[ start, *args ]
661  end
662end
663
664
665=begin rdoc ANTLR3::FileStream
666
667FileStream is a character stream that uses data stored in some external file. It
668is nearly identical to StringStream and functions as use data located in a file
669while automatically setting up the +source_name+ and +line+ parameters. It does
670not actually use any buffered IO operations throughout the stream navigation
671process. Instead, it reads the file data once when the stream is initialized.
672
673=end
674
675class FileStream < StringStream
676  
677  #
678  # creates a new FileStream object using the given +file+ object.
679  # If +file+ is a path string, the file will be read and the contents
680  # will be used and the +name+ attribute will be set to the path.
681  # If +file+ is an IO-like object (that responds to :read),
682  # the content of the object will be used and the stream will
683  # attempt to set its +name+ object first trying the method #name
684  # on the object, then trying the method #path on the object.
685  #
686  # see StringStream.new for a list of additional options
687  # the constructer accepts
688  # 
689  def initialize( file, options = {} )
690    case file
691    when $stdin then
692      data = $stdin.read
693      @name = '(stdin)'
694    when ARGF
695      data = file.read
696      @name = file.path
697    when ::File then
698      file = file.clone
699      file.reopen( file.path, 'r' )
700      @name = file.path
701      data = file.read
702      file.close
703    else
704      if file.respond_to?( :read )
705        data = file.read
706        if file.respond_to?( :name ) then @name = file.name
707        elsif file.respond_to?( :path ) then @name = file.path
708        end
709      else
710        @name = file.to_s
711        if test( ?f, @name ) then data = File.read( @name )
712        else raise ArgumentError, "could not find an existing file at %p" % @name
713        end
714      end
715    end
716    super( data, options )
717  end
718  
719end
720
721=begin rdoc ANTLR3::CommonTokenStream
722
723CommonTokenStream serves as the primary token stream implementation for feeding
724sequential token input into parsers.
725
726Using some TokenSource (such as a lexer), the stream collects a token sequence,
727setting the token's <tt>index</tt> attribute to indicate the token's position
728within the stream. The streams may be tuned to some channel value; off-channel
729tokens will be filtered out by the #peek, #look, and #consume methods.
730
731=== Sample Usage
732
733  
734  source_input = ANTLR3::StringStream.new("35 * 4 - 1")
735  lexer = Calculator::Lexer.new(source_input)
736  tokens = ANTLR3::CommonTokenStream.new(lexer)
737  
738  # assume this grammar defines whitespace as tokens on channel HIDDEN
739  # and numbers and operations as tokens on channel DEFAULT
740  tokens.look         # => 0 INT['35'] @ line 1 col 0 (0..1)
741  tokens.look(2)      # => 2 MULT["*"] @ line 1 col 2 (3..3)
742  tokens.tokens(0, 2)
743    # => [0 INT["35"] @line 1 col 0 (0..1), 
744    #     1 WS[" "] @line 1 col 2 (1..1), 
745    #     2 MULT["*"] @ line 1 col 3 (3..3)]
746    # notice the #tokens method does not filter off-channel tokens
747  
748  lexer.reset
749  hidden_tokens = 
750    ANTLR3::CommonTokenStream.new(lexer, :channel => ANTLR3::HIDDEN)
751  hidden_tokens.look # => 1 WS[' '] @ line 1 col 2 (1..1)
752
753=end
754
755class CommonTokenStream
756  include TokenStream
757  include Enumerable
758  
759  #
760  # constructs a new token stream using the +token_source+ provided. +token_source+ is
761  # usually a lexer, but can be any object that implements +next_token+ and includes
762  # ANTLR3::TokenSource.
763  #
764  # If a block is provided, each token harvested will be yielded and if the block
765  # returns a +nil+ or +false+ value, the token will not be added to the stream --
766  # it will be discarded.
767  #
768  # === Options
769  # [:channel] The channel value the stream should be tuned to initially
770  # [:source_name] The source name (file name) attribute of the stream
771  # 
772  # === Example
773  #
774  #   # create a new token stream that is tuned to channel :comment, and
775  #   # discard all WHITE_SPACE tokens
776  #   ANTLR3::CommonTokenStream.new(lexer, :channel => :comment) do |token|
777  #     token.name != 'WHITE_SPACE'
778  #   end
779  # 
780  def initialize( token_source, options = {} )
781    case token_source
782    when CommonTokenStream
783      # this is useful in cases where you want to convert a CommonTokenStream
784      # to a RewriteTokenStream or other variation of the standard token stream
785      stream = token_source
786      @token_source = stream.token_source
787      @channel = options.fetch( :channel ) { stream.channel or DEFAULT_CHANNEL }
788      @source_name = options.fetch( :source_name ) { stream.source_name }
789      tokens = stream.tokens.map { | t | t.dup }
790    else
791      @token_source = token_source
792      @channel = options.fetch( :channel, DEFAULT_CHANNEL )
793      @source_name = options.fetch( :source_name ) {  @token_source.source_name rescue nil }
794      tokens = @token_source.to_a
795    end
796    @last_marker = nil
797    @tokens = block_given? ? tokens.select { | t | yield( t, self ) } : tokens
798    @tokens.each_with_index { |t, i| t.index = i }
799    @position = 
800      if first_token = @tokens.find { |t| t.channel == @channel }
801        @tokens.index( first_token )
802      else @tokens.length
803      end
804  end
805  
806  #
807  # resets the token stream and rebuilds it with a potentially new token source.
808  # If no +token_source+ value is provided, the stream will attempt to reset the
809  # current +token_source+ by calling +reset+ on the object. The stream will
810  # then clear the token buffer and attempt to harvest new tokens. Identical in
811  # behavior to CommonTokenStream.new, if a block is provided, tokens will be
812  # yielded and discarded if the block returns a +false+ or +nil+ value.
813  # 
814  def rebuild( token_source = nil )
815    if token_source.nil?
816      @token_source.reset rescue nil
817    else @token_source = token_source
818    end
819    @tokens = block_given? ? @token_source.select { |token| yield( token ) } :   
820                             @token_source.to_a
821    @tokens.each_with_index { |t, i| t.index = i }
822    @last_marker = nil
823    @position = 
824      if first_token = @tokens.find { |t| t.channel == @channel }
825        @tokens.index( first_token )
826      else @tokens.length
827      end
828    return self
829  end
830  
831  #
832  # tune the stream to a new channel value
833  # 
834  def tune_to( channel )
835    @channel = channel
836  end
837  
838  def token_class
839    @token_source.token_class
840  rescue NoMethodError
841    @position == -1 and fill_buffer
842    @tokens.empty? ? CommonToken : @tokens.first.class
843  end
844  
845  alias index position
846  
847  def size
848    @tokens.length
849  end
850  
851  alias length size
852  
853  ###### State-Control ################################################
854  
855  #
856  # rewind the stream to its initial state
857  # 
858  def reset
859    @position = 0
860    @position += 1 while token = @tokens[ @position ] and
861                         token.channel != @channel
862    @last_marker = nil
863    return self
864  end
865  
866  #
867  # bookmark the current position of the input stream
868  # 
869  def mark
870    @last_marker = @position
871  end
872  
873  def release( marker = nil )
874    # do nothing
875  end
876  
877  
878  def rewind( marker = @last_marker, release = true )
879    seek( marker )
880  end
881  
882  #
883  # saves the current stream position, yields to the block,
884  # and then ensures the stream's position is restored before
885  # returning the value of the block
886  #  
887  def hold( pos = @position )
888    block_given? or return enum_for( :hold, pos )
889    begin
890      yield
891    ensure
892      seek( pos )
893    end
894  end
895  
896  ###### Stream Navigation ###########################################
897  
898  #
899  # advance the stream one step to the next on-channel token
900  # 
901  def consume
902    token = @tokens[ @position ] || EOF_TOKEN
903    if @position < @tokens.length
904      @position = future?( 2 ) || @tokens.length
905    end
906    return( token )
907  end
908  
909  #
910  # jump to the stream position specified by +index+
911  # note: seek does not check whether or not the
912  #       token at the specified position is on-channel,
913  #
914  def seek( index )
915    @position = index.to_i.bound( 0, @tokens.length )
916    return self
917  end
918  
919  #
920  # return the type of the on-channel token at look-ahead distance +k+. <tt>k = 1</tt> represents
921  # the current token. +k+ greater than 1 represents upcoming on-channel tokens. A negative
922  # value of +k+ returns previous on-channel tokens consumed, where <tt>k = -1</tt> is the last
923  # on-channel token consumed. <tt>k = 0</tt> has undefined behavior and returns +nil+
924  # 
925  def peek( k = 1 )
926    tk = look( k ) and return( tk.type )
927  end
928  
929  #
930  # operates simillarly to #peek, but returns the full token object at look-ahead position +k+
931  #
932  def look( k = 1 )
933    index = future?( k ) or return nil
934    @tokens.fetch( index, EOF_TOKEN )
935  end
936  
937  alias >> look
938  def << k
939    self >> -k
940  end
941  
942  #
943  # returns the index of the on-channel token at look-ahead position +k+ or nil if no other
944  # on-channel tokens exist
945  # 
946  def future?( k = 1 )
947    @position == -1 and fill_buffer
948    
949    case
950    when k == 0 then nil
951    when k < 0 then past?( -k )
952    when k == 1 then @position
953    else
954      # since the stream only yields on-channel
955      # tokens, the stream can't just go to the
956      # next position, but rather must skip
957      # over off-channel tokens
958      ( k - 1 ).times.inject( @position ) do |cursor, |
959        begin
960          tk = @tokens.at( cursor += 1 ) or return( cursor )
961          # ^- if tk is nil (i.e. i is outside array limits)
962        end until tk.channel == @channel
963        cursor
964      end
965    end
966  end
967  
968  #
969  # returns the index of the on-channel token at look-behind position +k+ or nil if no other
970  # on-channel tokens exist before the current token
971  # 
972  def past?( k = 1 )
973    @position == -1 and fill_buffer
974    
975    case
976    when k == 0 then nil
977    when @position - k < 0 then nil
978    else
979      
980      k.times.inject( @position ) do |cursor, |
981        begin
982          cursor <= 0 and return( nil )
983          tk = @tokens.at( cursor -= 1 ) or return( nil )
984        end until tk.channel == @channel
985        cursor
986      end
987      
988    end
989  end
990  
991  #
992  # yields each token in the stream (including off-channel tokens)
993  # If no block is provided, the method returns an Enumerator object.
994  # #each accepts the same arguments as #tokens
995  # 
996  def each( *args )
997    block_given? or return enum_for( :each, *args )
998    tokens( *args ).each { |token| yield( token ) }
999  end
1000  
1001  
1002  #
1003  # yields each token in the stream with the given channel value
1004  # If no channel value is given, the stream's tuned channel value will be used.
1005  # If no block is given, an enumerator will be returned. 
1006  # 
1007  def each_on_channel( channel = @channel )
1008    block_given? or return enum_for( :each_on_channel, channel )
1009    for token in @tokens
1010      token.channel == channel and yield( token )
1011    end
1012  end
1013  
1014  #
1015  # iterates through the token stream, yielding each on channel token along the way.
1016  # After iteration has completed, the stream's position will be restored to where
1017  # it was before #walk was called. While #each or #each_on_channel does not change
1018  # the positions stream during iteration, #walk advances through the stream. This
1019  # makes it possible to look ahead and behind the current token during iteration.
1020  # If no block is given, an enumerator will be returned. 
1021  # 
1022  def walk
1023    block_given? or return enum_for( :walk )
1024    initial_position = @position
1025    begin
1026      while token = look and token.type != EOF
1027        consume
1028        yield( token )
1029      end
1030      return self
1031    ensure
1032      @position = initial_position
1033    end
1034  end
1035  
1036  # 
1037  # returns a copy of the token buffer. If +start+ and +stop+ are provided, tokens
1038  # returns a slice of the token buffer from <tt>start..stop</tt>. The parameters
1039  # are converted to integers with their <tt>to_i</tt> methods, and thus tokens
1040  # can be provided to specify start and stop. If a block is provided, tokens are
1041  # yielded and filtered out of the return array if the block returns a +false+
1042  # or +nil+ value. 
1043  # 
1044  def tokens( start = nil, stop = nil )
1045    stop.nil?  || stop >= @tokens.length and stop = @tokens.length - 1
1046    start.nil? || stop < 0 and start = 0
1047    tokens = @tokens[ start..stop ]
1048    
1049    if block_given?
1050      tokens.delete_if { |t| not yield( t ) }
1051    end
1052    
1053    return( tokens )
1054  end
1055  
1056  
1057  def at( i )
1058    @tokens.at i
1059  end
1060  
1061  #
1062  # identical to Array#[], as applied to the stream's token buffer
1063  # 
1064  def []( i, *args )
1065    @tokens[ i, *args ]
1066  end
1067  
1068  ###### Standard Conversion Methods ###############################
1069  def inspect
1070    string = "#<%p: @token_source=%p @ %p/%p" %
1071      [ self.class, @token_source.class, @position, @tokens.length ]
1072    tk = look( -1 ) and string << " #{ tk.inspect } <--"
1073    tk = look( 1 ) and string << " --> #{ tk.inspect }"
1074    string << '>'
1075  end
1076  
1077  #
1078  # fetches the text content of all tokens between +start+ and +stop+ and
1079  # joins the chunks into a single string
1080  # 
1081  def extract_text( start = 0, stop = @tokens.length - 1 )
1082    start = start.to_i.at_least( 0 )
1083    stop = stop.to_i.at_most( @tokens.length )
1084    @tokens[ start..stop ].map! { |t| t.text }.join( '' )
1085  end
1086  
1087  alias to_s extract_text
1088  
1089end
1090
1091end
1092