Clean up Parser_.offset_to_location · jon.recoil.org/odoc@6902a44

jon.recoil.org / odoc

fork atom

this repo has no description

fork atom

Clean up Parser_.offset_to_location

Anton Bachin 8 years ago 6902a44c b4dc9b66

+47 -42

1 changed file

expand all

src

parser

parser_.ml

+47 -42

src/parser/parser_.ml

··· 1 - (* The [Lexing] module keeps track of only byte offsets into the input. To get 2 - line/column locations, the lexer usually has to call [Lexing.new_line] on 3 - every newline character. 1 + (* odoc's uses an ocamllex lexer. The "engine" for such lexers is the standard 2 + [Lexing] module. 3 + 4 + As the [Lexing] module reads the input, it keeps track of only the byte 5 + offset into the input. It is normally the job of each particular lexer 6 + implementation to decide which character sequences count as newlines, and 7 + keep track of line/column locations. This is usually done by writing several 8 + extra regular expressions, and calling [Lexing.new_line] at the right time. 9 + 10 + Keeping track of newlines like this makes the odoc lexer somewhat too 11 + diffiult to read, however. To factor the aspect of keeping track of newlines 12 + fully out of the odoc lexer, instead of having it keep track of newlines as 13 + it's scanning the input, the input is pre-scanned before feeding it into the 14 + lexer. A table of all the newlines is assembled, and used to convert offsets 15 + into line/column pairs after the lexer emits tokens. 4 16 5 - However, to keep the odoc lexer simple, it doesn't do that. Instead, this 6 - function is given the input string, and it returns a function which converts 7 - absolute offsets into the input into line/byte offset within line pairs. *) 8 - let make_offset_to_location_function 9 - : string -> (int -> Model.Location_.point) = fun s -> 17 + [offset_to_location ~input ~comment_location offset] converts the byte 18 + [offset], relative to the beginning of a comment, into a location, relative 19 + to the beginning of the file containing the comment. [input] is the comment 20 + text, and [comment_location] is the location of the comment within its file. 21 + The function is meant to be partially applied to its first two arguments, at 22 + which point it creates the table described above. The remaining function is 23 + then passed to the lexer, so it can apply the table to its emitted tokens. *) 24 + let offset_to_location 25 + : input:string -> comment_location:Lexing.position -> 26 + (int -> Model.Location_.point) = 27 + fun ~input ~comment_location -> 10 28 11 29 let rec find_newlines line_number input_index newlines_accumulator = 12 - if input_index >= String.length s then 30 + if input_index >= String.length input then 13 31 newlines_accumulator 14 32 else 15 - if s.[input_index] = '\n' then 33 + (* This is good enough to detect CR-LF also. *) 34 + if input.[input_index] = '\n' then 16 35 find_newlines 17 36 (line_number + 1) (input_index + 1) 18 37 ((line_number + 1, input_index + 1)::newlines_accumulator) ··· 23 42 let reversed_newlines : (int * int) list = 24 43 find_newlines 1 0 [(1, 0)] in 25 44 26 - fun absolute_offset -> 45 + fun byte_offset -> 27 46 let rec scan_to_last_newline reversed_newlines_prefix = 28 47 match reversed_newlines_prefix with 29 48 | [] -> 30 49 assert false 31 - | (line_number, line_start_offset)::prefix -> 32 - if line_start_offset <= absolute_offset then 33 - { 34 - Model.Location_.line = line_number; 35 - column = absolute_offset - line_start_offset 36 - } 50 + | (line_in_comment, line_start_offset)::prefix -> 51 + if line_start_offset > byte_offset then 52 + scan_to_last_newline prefix 37 53 else 38 - scan_to_last_newline prefix 54 + let column_in_comment = byte_offset - line_start_offset in 55 + let line_in_file = 56 + line_in_comment + comment_location.Lexing.pos_lnum - 1 in 57 + let column_in_file = 58 + if line_in_comment = 1 then 59 + column_in_comment + 60 + comment_location.Lexing.pos_cnum - 61 + comment_location.Lexing.pos_bol 62 + else 63 + column_in_comment 64 + in 65 + {Model.Location_.line = line_in_file; column = column_in_file} 39 66 in 40 67 scan_to_last_newline reversed_newlines 41 68 ··· 44 71 let parse_comment 45 72 ~permissive ~sections_allowed ~containing_definition ~location ~text = 46 73 47 - (* Converts byte offsets into the comment to line, column pairs, which are 48 - relative to the start of the file that contains the comment. *) 49 - let offset_to_location = 50 - let offset_to_location_relative_to_start_of_comment = 51 - lazy (make_offset_to_location_function text) in 52 - 53 - let offset_to_location_relative_to_start_of_file offset = 54 - let in_comment = 55 - (Lazy.force offset_to_location_relative_to_start_of_comment) offset in 56 - 57 - let line_in_file = in_comment.line + location.Lexing.pos_lnum - 1 in 58 - let offset_in_line = 59 - if in_comment.line = 1 then 60 - in_comment.column + location.Lexing.pos_cnum - location.Lexing.pos_bol 61 - else 62 - in_comment.column 63 - in 64 - 65 - {Model.Location_.line = line_in_file; column = offset_in_line} 66 - in 67 - 68 - offset_to_location_relative_to_start_of_file 69 - in 70 - 71 74 let token_stream = 72 75 let lexbuf = Lexing.from_string text in 76 + let offset_to_location = 77 + offset_to_location ~input:text ~comment_location:location in 73 78 let input : Lexer.input = 74 79 { 75 80 file = location.Lexing.pos_fname;