This was causing problems with the map validity, but it wasn't the map's
problem; it was GAML's. This is because the string in a may would point
to a place that GAML used twice. Ouch.
Signed-off-by: Gavin D. Howard <gavin@yzena.com>
I forgot that the BOM is U+FFFE. The Unicode replacement character is
U+FFFD, so that's what was really returned. I changed the code to return
an error and to have the lexer handle it appropriately.
Signed-off-by: Gavin Howard <gavin@yzena.com>
This is done by adding a struct that holds the codepoint, the bytes, and
the length of the codepoint in UTF-8 bytes. Then, this struct is used
everywhere.
Whereever the bytes are needed, the bytes are used. Whenever a straight
value is needed, the codepoint is used.
I stressed over this decision, but it was actually easier than I thought
to make the change. However, I'm glad I made it early because now I
don't have to worry about UTF-8 until I extend the language to allow any
kind of identifier/operator.
Signed-off-by: Gavin Howard <gavin@yzena.com>
Well, it's a function to get the length from the first byte. I used a
shift and a table lookup. The shift is to cut the table size by 8 times,
and then the table lookup is just that. I'm hoping this will cut down on
the number of branches.
Signed-off-by: Gavin Howard <gavin@yzena.com>
The module is serialization/deserialization. This is in preparation for
putting JSON in the module, as well as binary
serialization/deserialization for Yvm.
Signed-off-by: Gavin Howard <gavin@yzena.com>
Before this commit, GAML code actually depended on Yao code. This commit
splits them so that GAML has its own token type and code.
Signed-off-by: Gavin Howard <gavin@yzena.com>
I was going to turn these into tests, but lexing and parsing has been
refactored so much that the paths have probably changed too much, so it
would probably be more useful to redo the fuzzing from scratch again and
regenerate the outputs.
Signed-off-by: Gavin Howard <gavin@yzena.com>
The module is called "lang". It implements the lexer, lexer variables,
lexer modes, errors, lexer files, and basic parsing.
Signed-off-by: Gavin Howard <gavin@yzena.com>
These bugs come from changes to the map where it panics if an item does
not exist. The mappool just assumed that the map returned NULL, which it
did previously.
This commit uses the "exists" functions to check and to grab the value,
if any.
Signed-off-by: Gavin Howard <gavin@yzena.com>
This is done because I'm going to use them for my malloc()
implementation as well, so they won't be for stacks. They are also the
most low-level alloc code.
Signed-off-by: Gavin Howard <gavin@yzena.com>
I want future customers to use debug-enabled builds for better bug
reports, but I don't want them running asserts because of performance.
Signed-off-by: Gavin Howard <gavin@yzena.com>
Originally, both were under YC_DEBUG_CODE. That's kind of how it is on
my bc.
However, in Yc, a lot of validity checks are used, and that's not how
BC_DEBUG_CODE is used in bc. Instead, BC_DEBUG_CODE is used mostly for
printing during debugging.
I like my validity checks, though; they have proven most useful, and I'm
going to add more. So I decided that printing during debugging should be
exclusively under YC_DEBUG_PRINT.
So I split them.
This commit also changes a YC_DEBUG guard in lists to YC_DEBUG_CODE
because those validity checks should be able to be turned off.
I should add YC_DEBUG_CODE to my testing regimen specifically because
they will find *so* many bugs. And I should add *so* many more.
Signed-off-by: Gavin Howard <gavin@yzena.com>
This is important for several reasons. The biggest is ABI. So yeah,
compiler should be considered part of the platform.
Signed-off-by: Gavin Howard <gavin@yzena.com>