act/vendor/github.com/soniakeys/graph/hacking.adoc

136 lines
6.8 KiB
Text
Raw Normal View History

= Hacking
== Get, install
Basic use of the package is just go get, or git clone; go install. There are
no dependencies outside the standard library.
== Build
CI is currently on travis-ci.org.
The build runs go vet with a few exceptions for things I'm not a big fan of.
https://github.com/client9/misspell has been valuable.
Also I wrote https://github.com/soniakeys/vetc to validate that each source
file has copyright/license statement.
Then, its not in the ci script, but I wrote https://github.com/soniakeys/rcv
to put coverage stats in the readme. Maybe it could be commit hook or
something but for now Ill try just running it manually now and then.
Go fmt is not in the ci script, but I have at least one editor set up to run
it on save, so code should stay formatted pretty well.
== Examples with random output
The math/rand generators with constant seeds used to give consistent numbers
across Go versions and so some examples relied on this. Sometime after Go 1.9
though the numbers changed. The technique for now is to go ahead and write
those examples, get them working, then change the `// Output:` line to
`// Random output:`. This keeps them showing in go doc but keeps them from
being run by go test. This works for now. It might be revisited at some
point.
== Plans
The primary to-do list is the issue tracker on Github.
== Direction, focus, features
The project started with no real goal or purpose, just as a place for some code
that might be useful. Here are some elements that characterize the direction.
* The focus has been on algorithms on adjacency lists. That is, adjacency list
is the fundamental representation for most implemented algorithms. There are
many other interesting representations, many reasons to use them, but
adjacency list is common in literature and practice. It has been useful to
focus on this data representation, at first anyway.
* The focus has been on single threaded algorithms. Again, there is much new
and interesting work being done with concurrent, parallel, and distributed
graph algorithms, and Go might be an excellent language to implement some of
these algorithms. But as a preliminary step, more traditional
single-threaded algorithms are implemented.
* The focus has been on static finite graphs. Again there is much interesting
work in online algorithms, dynamic graphs, and infinite graphs, but these
are not generally considered here.
* Algorithms selected for implementation are generally ones commonly appearing
in beginning graph theory discussions and in general purpose graph libraries
in other programming languages. With these as drivers, there's a big risk
developing a library of curiosities and academic exercises rather than a
library of practical utility. But well, it's a start. The hope is that
there are some practical drivers behind graph theory and behind other graph
libraries.
* There is active current research going on in graph algorithm development.
One goal for this library is to implement newer and faster algorithms.
In some cases where it seems not too much work, older/classic/traditional
algorithms may be implemented for comparison. These generally go in the
alt subdirectory.
== General principles
* The API is rather low level.
* Slices instead of maps. Maps are pretty efficient, and the property of
unique keys can be useful, But slices are still faster and more efficient,
and the unique key property is not always needed or wanted. The Adjacency
list implementation of this library is all done in slices. Slices are used
in algorithms where possible, in preference to maps. Maps are still used in
some cases where uniqueness is needed.
* Interfaces not generally used. Algorithms are implemented directly on
concrete data types and not on interfaces describing the capabilities of
the data types. The abstraction of interfaces is a nice match to graph
theory and the convenience of running graph algorithms on any type that
implements an interface is appealing, but the costs seem too high to me.
Slices are rich with capababilites that get hidden behind interfaces and
direct slice manipulation is always faster than going through interfaces.
An impedance for programs using the library is that they will generally
have to implement a mapping from slice indexes to their application data,
often including for example, some other form of node ID. This seems fair
to push this burden outside the graph library; the library cannot know
the needs of this mapping.
* Bitsets are widely used, particularly to store one bit of information per
node of a graph. I used math/big at first but then moved to a dense bitset
of my own. Yes, I considered other third-party bitsets but had my own
feature set I wanted. A slice of bools is another alternative. Bools will
be faster in almost all cases but the bitset will use less memory. I'm
chosing size over speed for now.
* Code generation is used to provide methods that work on both labeled and
unlabeled graphs. Code is written to labeled types, then transformations
generate the unlabled equivalents.
* Methods are named for what they return rather than what they do, where
reasonable anyway.
* Consistency in method signature and behavior across corresponding methods,
for example directed/undirected, labeled/unlabeled, again, as long as it's
reasonable.
* Sometimes in tension with the consistency principle, methods are lazy about
datatypes of parameters and return values. Sometimes a vale might have
different reasonable representations, a set might be a bitset, map, slice
of bools, or slice of set members for example. Methods will take and return
whatever is convenient for them and not convert the form just for consistency
or to try to guess what a caller would prefer.
* Methods return multiple results for whatever the algorithm produces that
might be of interest. Sometimes an algorithm will have a primary result but
then some secondary values that also might be of interest. If they are
already computed as a byproduct of the algorithm, or can be computed at
negligible cost, return them.
* Sometimes in conflict with the multiple result principle, methods will not
speculatively compute secondary results if there is any significant cost
and if the secondary result can be just as easily computed later.
== Code Maintenance
There are tons of cut and paste variants. There's the basic AdjacencyList,
then Directed and Undirected variants, then Labeled variants of each of those.
Code gen helps avoid some cut and paste but there's a bunch that doesn't
code gen very well and so is duplicated with cut and paste. In particular
the testable examples in the _test files don't cg well and so are pretty much
all duplicated by hand. If you change code, think about where there should
be variants and go look to see if the variants need similar changes.