Finish documenting `gen/` in the development manual

This necessitated documenting algorithms from `lib2.bc` as well as
documenting `strgen.c` thoroughly.

Signed-off-by: Gavin Howard <gavin@yzena.com>
computed_goto
Gavin Howard 1 year ago
parent 6787a0fcd9
commit d72062c26f
Signed by: gavin
GPG Key ID: C08038BDF280D33E
  1. 108
      gen/strgen.c
  2. 5
      gen/strgen.sh
  3. 148
      manuals/algorithms.md
  4. 40
      manuals/development.md

@ -67,40 +67,79 @@ static const char* const bc_gen_name_extern = "extern const char %s[];\n\n";
#define INVALID_PARAMS (3)
// This is the max width to print characters to the screen. This is to ensure
// that lines don't go over 80 characters.
#define MAX_WIDTH (74)
// that lines don't go much over 80 characters.
#define MAX_WIDTH (72)
/**
* Open a file. This function is to smooth over differences between POSIX and
* Windows.
* @param f A pointer to the FILE pointer that will be initialized.
* @param filename The name of the file.
* @param mode The mode to open the file in.
*/
static void open_file(FILE** f, const char* filename, const char* mode) {
#ifndef _WIN32
*f = fopen(filename, mode);
#else // _WIN32
// We want the file pointer to be NULL on failure, but fopen_s() is not
// guaranteed to set it.
*f = NULL;
fopen_s(f, filename, mode);
#endif // _WIN32
}
/**
* Outputs a label, which is a string literal that the code can use as a name
* for the file that is being turned into a string. This is important for the
* math libraries because the parse and lex code expects a filename. The label
* becomes the filename for the purposes of lexing and parsing.
*
* The label is generated from bc_gen_label (above). It has the form:
*
* const char *<label_name> = <label>;
*
* This function is also needed to smooth out differences between POSIX and
* Windows, specifically, the fact that Windows uses backslashes for filenames
* and that backslashes have to be escaped in a string literal.
*
* @param out The file to output to.
* @param label The label name.
* @param name The actual label text, which is a filename.
* @return Positive if no error, negative on error, just like *printf().
*/
static int output_label(FILE* out, const char* label, const char* name) {
#ifndef _WIN32
return fprintf(out, bc_gen_label, label, name);
#else // _WIN32
size_t i, count = 0, len = strlen(name);
char* buf;
int ret;
for (i = 0; i < len; ++i) {
count += (name[i] == '\\');
}
// This loop counts how many backslashes there are in the label.
for (i = 0; i < len; ++i) count += (name[i] == '\\');
buf = (char*) malloc(len + 1 + count);
if (buf == NULL) return -1;
count = 0;
// This loop is the meat of the Windows version. What it does is copy the
// label byte-for-byte, unless it encounters a backslash, in which case, it
// copies the backslash twice to have it escaped properly in the string
// literal.
for (i = 0; i < len; ++i) {
buf[i + count] = name[i];
if (name[i] == '\\') {
count += 1;
buf[i + count] = name[i];
@ -118,6 +157,51 @@ static int output_label(FILE* out, const char* label, const char* name) {
#endif // _WIN32
}
/**
* This program generates C strings (well, actually, C char arrays) from text
* files. It generates 1 C source file. The resulting file has this structure:
*
* <Copyright Header>
*
* [<Label Extern>]
*
* <Char Array Extern>
*
* [<Preprocessor Guard Begin>]
* [<Label Definition>]
*
* <Char Array Definition>
* [<Preprocessor Guard End>]
*
* Anything surrounded by square brackets may not be in the final generated
* source file.
*
* The required parameters are:
*
* @param input Input filename.
* @param output Output filename.
* @param name The name of the char array.
*
* The optional parameters are:
*
* @param label If given, a label for the char array. See the comment for the
* output_label() function. It is meant as a "filename" for the
* text when processed by bc and dc. If label is given, then the
* <Label Extern> and <Label Definition> will exist in the
* generated source file.
* @param define If given, a preprocessor macro that should be used as a guard
* for the char array and its label. If define is given, then
* <Preprocessor Guard Begin> will exist in the form
* "#if <define>" as part of the generated source file, and
* <Preprocessor Guard End> will exist in the form
* "endif // <define>".
* @param rmtabs If this parameter exists, it must be an integer. If it is
* non-zero, then tabs are removed from the input file text
* before outputting to the output char array.
*
* All text files that are transformed have license comments. This program finds
* the end of that comment and strips it out as well.
*/
int main(int argc, char *argv[]) {
FILE *in, *out;
@ -125,8 +209,9 @@ int main(int argc, char *argv[]) {
int c, count, slashes, err = IO_ERR;
bool has_label, has_define, remove_tabs;
if (argc < 5) {
printf("usage: %s input output name [label [define [remove_tabs]]]\n", argv[0]);
if (argc < 4) {
printf("usage: %s input output name [label [define [rmtabs]]]\n",
argv[0]);
return INVALID_PARAMS;
}
@ -155,18 +240,24 @@ int main(int argc, char *argv[]) {
c = count = slashes = 0;
// This is where the end of the license comment is found.
while (slashes < 2 && (c = fgetc(in)) >= 0) {
slashes += (slashes == 1 && c == '/' && fgetc(in) == '\n');
slashes += (!slashes && c == '/' && fgetc(in) == '*');
}
// The file is invalid if the end of the license comment could not be found.
if (c < 0) {
err = INVALID_INPUT_FILE;
goto err;
}
// Do not put extra newlines at the beginning of the char array.
while ((c = fgetc(in)) == '\n');
// This loop is what generates the actual char array. It counts how many
// chars it has printed per line in order to insert newlines at appropriate
// places. It also skips tabs if they should be removed.
while (c >= 0) {
int val;
@ -189,6 +280,7 @@ int main(int argc, char *argv[]) {
c = fgetc(in);
}
// Make sure the end looks nice and insert the NUL byte at the end.
if (!count && (fputc(' ', out) == EOF || fputc(' ', out) == EOF)) goto err;
if (fprintf(out, "0\n};\n") < 0) goto err;

@ -32,8 +32,11 @@ export LC_CTYPE=C
progname=${0##*/}
# See strgen.c comment on main() for what these mean. Note, however, that this
# script generates a string literal, not a char array. To understand the
# consequences of that, see manuals/development.md#strgenc.
if [ $# -lt 3 ]; then
echo "usage: $progname input output name [label [define [remove_tabs]]]"
echo "usage: $progname input output name [label [define [rmtabs]]]"
exit 1
fi

@ -62,8 +62,10 @@ a complexity of `O((n*log(n))^log_2(3))` which is favorable to the
This `bc` implements the fast algorithm [Newton's Method][4] (also known as the
Newton-Raphson Method, or the [Babylonian Method][5]) to perform the square root
operation. Its complexity is `O(log(n)*n^2)` as it requires one division per
iteration, and it doubles the amount of correct digits per iteration.
operation.
Its complexity is `O(log(n)*n^2)` as it requires one division per iteration, and
it doubles the amount of correct digits per iteration.
### Sine and Cosine (`bc` Math Library Only)
@ -103,7 +105,9 @@ to calculate `e^x`. Since this only works when `x` is small, it uses
e^x = (e^(x/2))^2
```
to reduce `x`. It has a complexity of `O(n^3)`.
to reduce `x`.
It has a complexity of `O(n^3)`.
**Note**: this series can also produce errors of 1 ULP, so I recommend users do
their calculations with the precision (`scale`) set to at least 1 greater than
@ -124,7 +128,9 @@ and uses the relation
ln(x^2) = 2 * ln(x)
```
to sufficiently reduce `x`. It has a complexity of `O(n^3)`.
to sufficiently reduce `x`.
It has a complexity of `O(n^3)`.
**Note**: this series can also produce errors of 1 ULP, so I recommend users do
their calculations with the precision (`scale`) set to at least 1 greater than
@ -179,6 +185,137 @@ exponentiation. The complexity is `O(e*n^2)`, which may initially seem
inefficient, but `n` is kept small by maintaining small numbers. In practice, it
is extremely fast.
### Non-Integer Exponentiation (`bc` Math Library 2 Only)
This is implemented in the function `p(x,y)`.
The algorithm used is to use the formula `e(y*l(x))`.
It has a complexity of `O(n^3)` because both `e()` and `l()` do.
### Rounding (`bc` Math Library 2 Only)
This is implemented in the function `r(x,p)`.
The algorithm is a simple method to check if rounding away from zero is
necessary, and if so, adds `1e10^p`.
It has a complexity of `O(n)` because of add.
### Ceiling (`bc` Math Library 2 Only)
This is implemented in the function `ceil(x,p)`.
The algorithm is a simple add of one less decimal place than `p`.
It has a complexity of `O(n)` because of add.
### Factorial (`bc` Math Library 2 Only)
This is implemented in the function `f(n)`.
The algorithm is a simple multiplication loop.
It has a complexity of `O(n^3)` because of linear amount of `O(n^2)`
multiplications.
### Permutations (`bc` Math Library 2 Only)
This is implemented in the function `perm(n,k)`.
The algorithm is to use the formula `n!/(n-k)!`.
It has a complexity of `O(n^3)` because of the division and factorials.
### Combinations (`bc` Math Library 2 Only)
This is implemented in the function `comb(n,r)`.
The algorithm is to use the formula `n!/r!*(n-r)!`.
It has a complexity of `O(n^3)` because of the division and factorials.
### Logarithm of Any Base (`bc` Math Library 2 Only)
This is implemented in the function `log(x,b)`.
The algorithm is to use the formula `l(x)/l(b)` with double the `scale` because
there is no good way of knowing how many digits of precision are needed when
switching bases.
It has a complexity of `O(n^3)` because of the division and `l()`.
### Logarithm of Base 2 (`bc` Math Library 2 Only)
This is implemented in the function `l2(x)`.
This is a convenience wrapper around `log(x,2)`.
### Logarithm of Base 10 (`bc` Math Library 2 Only)
This is implemented in the function `l10(x)`.
This is a convenience wrapper around `log(x,10)`.
### Root (`bc` Math Library 2 Only)
This is implemented in the function `root(x,n)`.
The algorithm is [Newton's method][9]. The initial guess is calculated as
`10^ceil(length(x)/n)`.
Like square root, its complexity is `O(log(n)*n^2)` as it requires one division
per iteration, and it doubles the amount of correct digits per iteration.
### Cube Root (`bc` Math Library 2 Only)
This is implemented in the function `cbrt(x)`.
This is a convenience wrapper around `root(x,3)`.
### Greatest Common Divisor (`bc` Math Library 2 Only)
This is implemented in the function `gcd(a,b)`.
The algorithm is an iterative version of the [Euclidean Algorithm][10].
It has a complexity of `O(n^4)` because it has a linear number of divisions.
This function ensures that `a` is always bigger than `b` before starting the
algorithm.
### Least Common Multiple (`bc` Math Library 2 Only)
This is implemented in the function `lcm(a,b)`.
The algorithm uses the formula `a*b/gcd(a,b)`.
It has a complexity of `O(n^4)` because of `gcd()`.
### Pi (`bc` Math Library 2 Only)
This is implemented in the function `pi(s)`.
The algorithm uses the formula `4*a(1)`.
It has a complexity of `O(n^3)` because of arctangent.
### Tangent (`bc` Math Library 2 Only)
This is implemented in the function `t(x)`.
The algorithm uses the formula `s(x)/c(x)`.
It has a complexity of `O(n^3)` because of sine, cosine, and division.
### Atan2 (`bc` Math Library 2 Only)
This is implemented in the function `a2(y,x)`.
The algorithm uses the [standard formulas][11].
It has a complexity of `O(n^3)` because of arctangent.
[1]: https://en.wikipedia.org/wiki/Karatsuba_algorithm
[2]: https://en.wikipedia.org/wiki/Long_division
[3]: https://en.wikipedia.org/wiki/Exponentiation_by_squaring
@ -187,3 +324,6 @@ is extremely fast.
[6]: https://en.wikipedia.org/wiki/Unit_in_the_last_place
[7]: https://people.eecs.berkeley.edu/~wkahan/LOG10HAF.TXT
[8]: https://en.wikipedia.org/wiki/Modular_exponentiation#Memory-efficient_method
[9]: https://en.wikipedia.org/wiki/Root-finding_algorithms#Newton's_method_(and_similar_derivative-based_methods)
[10]: https://en.wikipedia.org/wiki/Euclidean_algorithm
[11]: https://en.wikipedia.org/wiki/Atan2#Definition_and_computation

@ -246,9 +246,32 @@ However, tabs at the beginning of lines are kept for two reasons:
For more details about the algorithms used, see the [algorithms manual][25].
#### `lib2.bc`
However, there are a few snares for unwary programmers.
First, all constants must be one digit. This is because otherwise, multi-digit
constants could be interpreted wrongly if the user uses a different `ibase`.
This does not happen with single-digit numbers because they are guaranteed to be
interpreted what number they would be if the `ibase` was as high as possible.
This is why `A` is used in the library instead of `10`, and things like `2*9*A`
for `180` in [`lib2.bc`][26].
As an alternative, you can set `ibase` in the function, but if you do, make sure
to set it with a single-digit number and beware the snare below...
Second, `scale`, `ibase`, and `obase` must be safely restored before returning
from any function in the library. This is because without the `-g` option,
functions are allowed to change any of the globals.
Third, all local variables in a function must be declared in an `auto` statement
before doing anything else. This includes arrays. However, function parameters
are considered predeclared.
TODO: Document algorithms.
Fourth, and this is only a snare for `lib.bc`, not [`lib2.bc`][26], the code
must not use *any* extensions. It has to work when users use the `-s` or `-w`
flags.
#### `lib2.bc`
A `bc` script containing the [extended math library][7].
@ -257,9 +280,10 @@ extraneous whitespace, except for tabs at the beginning of lines.
For more details about the algorithms used, see the [algorithms manual][25].
#### `strgen.c`
Also, be sure to check [`lib.bc`][8] for the snares that can trip up unwary
programmers when writing code for `lib2.bc`.
TODO: Document actual source file.
#### `strgen.c`
Code for the program to generate C strings from text files. This is the original
program, although [`strgen.sh`][9] was added later.
@ -289,8 +313,6 @@ takes, and how it works.
#### `strgen.sh`
TODO: Document actual source file.
An `sh` script that will generate C strings that uses only POSIX utilities. This
exists for those situations where a host C99 compiler is not available, and the
environment limits mentioned above in [`strgen.c`][15] don't matter.
@ -304,6 +326,8 @@ how it works.
* Document all code assumptions with asserts.
* Document all functions with Doxygen comments.
* Compilers and their quirks, as well as warning settings on Clang.
* My vim-bc repo.
* The purpose of every file.
* How locale works.
* How locales are installed.
@ -312,7 +336,8 @@ how it works.
* How all manpage versions are generated.
* Fuzzing.
* Including my `tmuxp` files.
* Can't use `libdislocator.so`.
* Can't use `libdislocator.so`. It causes crashes when it can't allocate
memory.
* Use `AFL_HARDEN` during build for hardening.
* Use `CC=afl-clang-lto` and `CFLAGS="-flto"`.
@ -341,3 +366,4 @@ how it works.
[23]: #history
[24]: https://clang.llvm.org/docs/ClangFormat.html
[25]: ./algorithms.md
[26]: #lib2bc

Loading…
Cancel
Save