136 lines
6.8 KiB
Text
136 lines
6.8 KiB
Text
|
= Hacking
|
|||
|
|
|||
|
== Get, install
|
|||
|
Basic use of the package is just go get, or git clone; go install. There are
|
|||
|
no dependencies outside the standard library.
|
|||
|
|
|||
|
== Build
|
|||
|
CI is currently on travis-ci.org.
|
|||
|
|
|||
|
The build runs go vet with a few exceptions for things I'm not a big fan of.
|
|||
|
|
|||
|
https://github.com/client9/misspell has been valuable.
|
|||
|
|
|||
|
Also I wrote https://github.com/soniakeys/vetc to validate that each source
|
|||
|
file has copyright/license statement.
|
|||
|
|
|||
|
Then, it’s not in the ci script, but I wrote https://github.com/soniakeys/rcv
|
|||
|
to put coverage stats in the readme. Maybe it could be commit hook or
|
|||
|
something but for now I’ll try just running it manually now and then.
|
|||
|
|
|||
|
Go fmt is not in the ci script, but I have at least one editor set up to run
|
|||
|
it on save, so code should stay formatted pretty well.
|
|||
|
|
|||
|
== Examples with random output
|
|||
|
The math/rand generators with constant seeds used to give consistent numbers
|
|||
|
across Go versions and so some examples relied on this. Sometime after Go 1.9
|
|||
|
though the numbers changed. The technique for now is to go ahead and write
|
|||
|
those examples, get them working, then change the `// Output:` line to
|
|||
|
`// Random output:`. This keeps them showing in go doc but keeps them from
|
|||
|
being run by go test. This works for now. It might be revisited at some
|
|||
|
point.
|
|||
|
|
|||
|
== Plans
|
|||
|
The primary to-do list is the issue tracker on Github.
|
|||
|
|
|||
|
== Direction, focus, features
|
|||
|
The project started with no real goal or purpose, just as a place for some code
|
|||
|
that might be useful. Here are some elements that characterize the direction.
|
|||
|
|
|||
|
* The focus has been on algorithms on adjacency lists. That is, adjacency list
|
|||
|
is the fundamental representation for most implemented algorithms. There are
|
|||
|
many other interesting representations, many reasons to use them, but
|
|||
|
adjacency list is common in literature and practice. It has been useful to
|
|||
|
focus on this data representation, at first anyway.
|
|||
|
|
|||
|
* The focus has been on single threaded algorithms. Again, there is much new
|
|||
|
and interesting work being done with concurrent, parallel, and distributed
|
|||
|
graph algorithms, and Go might be an excellent language to implement some of
|
|||
|
these algorithms. But as a preliminary step, more traditional
|
|||
|
single-threaded algorithms are implemented.
|
|||
|
|
|||
|
* The focus has been on static finite graphs. Again there is much interesting
|
|||
|
work in online algorithms, dynamic graphs, and infinite graphs, but these
|
|||
|
are not generally considered here.
|
|||
|
|
|||
|
* Algorithms selected for implementation are generally ones commonly appearing
|
|||
|
in beginning graph theory discussions and in general purpose graph libraries
|
|||
|
in other programming languages. With these as drivers, there's a big risk
|
|||
|
developing a library of curiosities and academic exercises rather than a
|
|||
|
library of practical utility. But well, it's a start. The hope is that
|
|||
|
there are some practical drivers behind graph theory and behind other graph
|
|||
|
libraries.
|
|||
|
|
|||
|
* There is active current research going on in graph algorithm development.
|
|||
|
One goal for this library is to implement newer and faster algorithms.
|
|||
|
In some cases where it seems not too much work, older/classic/traditional
|
|||
|
algorithms may be implemented for comparison. These generally go in the
|
|||
|
alt subdirectory.
|
|||
|
|
|||
|
== General principles
|
|||
|
* The API is rather low level.
|
|||
|
|
|||
|
* Slices instead of maps. Maps are pretty efficient, and the property of
|
|||
|
unique keys can be useful, But slices are still faster and more efficient,
|
|||
|
and the unique key property is not always needed or wanted. The Adjacency
|
|||
|
list implementation of this library is all done in slices. Slices are used
|
|||
|
in algorithms where possible, in preference to maps. Maps are still used in
|
|||
|
some cases where uniqueness is needed.
|
|||
|
|
|||
|
* Interfaces not generally used. Algorithms are implemented directly on
|
|||
|
concrete data types and not on interfaces describing the capabilities of
|
|||
|
the data types. The abstraction of interfaces is a nice match to graph
|
|||
|
theory and the convenience of running graph algorithms on any type that
|
|||
|
implements an interface is appealing, but the costs seem too high to me.
|
|||
|
Slices are rich with capababilites that get hidden behind interfaces and
|
|||
|
direct slice manipulation is always faster than going through interfaces.
|
|||
|
An impedance for programs using the library is that they will generally
|
|||
|
have to implement a mapping from slice indexes to their application data,
|
|||
|
often including for example, some other form of node ID. This seems fair
|
|||
|
to push this burden outside the graph library; the library cannot know
|
|||
|
the needs of this mapping.
|
|||
|
|
|||
|
* Bitsets are widely used, particularly to store one bit of information per
|
|||
|
node of a graph. I used math/big at first but then moved to a dense bitset
|
|||
|
of my own. Yes, I considered other third-party bitsets but had my own
|
|||
|
feature set I wanted. A slice of bools is another alternative. Bools will
|
|||
|
be faster in almost all cases but the bitset will use less memory. I'm
|
|||
|
chosing size over speed for now.
|
|||
|
|
|||
|
* Code generation is used to provide methods that work on both labeled and
|
|||
|
unlabeled graphs. Code is written to labeled types, then transformations
|
|||
|
generate the unlabled equivalents.
|
|||
|
|
|||
|
* Methods are named for what they return rather than what they do, where
|
|||
|
reasonable anyway.
|
|||
|
|
|||
|
* Consistency in method signature and behavior across corresponding methods,
|
|||
|
for example directed/undirected, labeled/unlabeled, again, as long as it's
|
|||
|
reasonable.
|
|||
|
|
|||
|
* Sometimes in tension with the consistency principle, methods are lazy about
|
|||
|
datatypes of parameters and return values. Sometimes a vale might have
|
|||
|
different reasonable representations, a set might be a bitset, map, slice
|
|||
|
of bools, or slice of set members for example. Methods will take and return
|
|||
|
whatever is convenient for them and not convert the form just for consistency
|
|||
|
or to try to guess what a caller would prefer.
|
|||
|
|
|||
|
* Methods return multiple results for whatever the algorithm produces that
|
|||
|
might be of interest. Sometimes an algorithm will have a primary result but
|
|||
|
then some secondary values that also might be of interest. If they are
|
|||
|
already computed as a byproduct of the algorithm, or can be computed at
|
|||
|
negligible cost, return them.
|
|||
|
|
|||
|
* Sometimes in conflict with the multiple result principle, methods will not
|
|||
|
speculatively compute secondary results if there is any significant cost
|
|||
|
and if the secondary result can be just as easily computed later.
|
|||
|
|
|||
|
== Code Maintenance
|
|||
|
There are tons of cut and paste variants. There's the basic AdjacencyList,
|
|||
|
then Directed and Undirected variants, then Labeled variants of each of those.
|
|||
|
Code gen helps avoid some cut and paste but there's a bunch that doesn't
|
|||
|
code gen very well and so is duplicated with cut and paste. In particular
|
|||
|
the testable examples in the _test files don't cg well and so are pretty much
|
|||
|
all duplicated by hand. If you change code, think about where there should
|
|||
|
be variants and go look to see if the variants need similar changes.
|