Writing High Quality, Well Scoped, Commits

In modern software projects, a clean Git history is more than just an aesthetic choice – it is a cornerstone of maintainable, collaborative development. Practitioners across the industry (from Linux kernel maintainers to Google’s Angular team) advocate for atomic commits: each commit being a focused, self-contained change that can stand on its own.

This article outlines widely accepted conventions and best practices for writing such commits in a language-agnostic context, drawing on official guidelines and community consensus. We discuss:

Keeping commits small
Ensuring each commit builds and passes tests
Making commits easy to revert or reorder
Including related updates (like documentation and tests) in the same commit

Relevant examples and scenarios are provided to illustrate these principles.

Keep Commits Focused

Make each commit an atomic unit of change. Commits should encapsulate a single logical change or fix. As the official Git documentation advises, you should “split your changes into small logical steps, and commit each of them”¹. Each commit ought to be justifiable on its own merits and easily understood in isolation². In practice, this means you should avoid bundling unrelated modifications into one commit. For example, if you are fixing two separate bugs or adding two independent features, handle them in separate commits (or even separate branches) rather than a single combined commit³. Small, focused commits make it easier for others to review your work and to trace through the project history later (e.g. using git blame or git log)¹.

Commit related changes together, and nothing more. A commit should be a wrapper for related changes only³. This is a common theme in many guides, from the Linux kernel’s Submitting Patches guide to internal team wikis. The Linux kernel documentation explicitly states: “Separate each logical change into a separate patch.” For instance, if a change includes a bug fix and a performance improvement, split those into two patches (commits)². Conversely, if one logical change requires edits across multiple files, those can be grouped in one commit². The rule of thumb is that each commit addresses one idea or issue. This helps reviewers verify the change and ensures the commit is self-explanatory.

Example: If you are refactoring a module and adding a new feature in that module, do it in two commits. The first commit might be titled “refactor(auth): simplify token validation logic” containing only the cleanup/refactoring, and the second commit “feat(auth): add support for multi-factor login” containing the new feature implementation. This separation makes it clear what each change does and allows one to be reverted or adjusted without affecting the other.

Commit often to keep changes small. Embracing a workflow of frequent commits helps achieve this granularity. Rather than coding an entire feature over several days and committing at the end, break the work into smaller sub-tasks or milestones that can be committed independently. Commit each meaningful step – this might be every few hours or even more frequently. Regular commits prevent huge “omnimbus” changes and reduce the chance of merge conflicts⁴. It’s often said that it’s easier to squash multiple small commits later than to split a gigantic commit. Git’s interactive rebase (git rebase -i) makes it possible to combine or edit commits after the fact, but splitting one large, tangled commit is very difficult. The Git project’s own workflow notes underscore this: don’t fear making too many small commits; you can always clean up history before merging. “It is always easier to squash a few commits together than to split one big commit into several”, and you can polish commit series with rebase before publishing¹.

Avoid mixing concerns in one commit. Any changes that are not directly related to each other should live in separate commits. For instance, do not bundle cosmetic changes (like reformatting code or fixing typos) with functional changes. If you happen to re-indent a code block or rename variables while fixing a bug, consider committing the refactoring first (or separately) and then the bug fix. This way, the bug fix commit’s diff is focused only on the substantive change, unclouded by whitespace or renaming noise. In large projects, maintainers appreciate this separation. The Linux kernel guidelines even give a specific example: if you need to move code from one file to another, do not modify that code in the same commit as the move – perform a pure move in one commit, and then make changes in a subsequent commit ².

Ensure Commits Pass Tests

A hallmark of a well-scoped commit is that the project remains in a working, buildable state after that commit. In other words, if someone checks out any given commit from the history, the code should compile (or run) and ideally all tests should pass at that point. Keeping the repository in a buildable state at each commit is crucial for tools like git bisect, which relies on testing a range of commits to pinpoint regressions ². If some intermediate commit breaks the build or test suite, it can frustrate developers trying to bisect, not to mention teammates who might check out that commit. The Linux kernel process documentation explicitly urges developers: “take special care to ensure that the kernel builds and runs properly after each patch in the series”, because someone may split your series at any point in a bisection². Similarly, the official Git workflows guide says each commit should “pass the test suite, etc.” ¹, underlining that each step in your commit history should maintain project integrity.

Don’t commit half-done work. You should only commit code when a logical chunk of functionality is completed and integrated. If your work is still in progress (e.g. a feature is only partially implemented, or tests are still failing), avoid committing that to the main branch. Instead, use a feature branch or even Git’s stash to save your work until it’s ready⁴. An oft-cited guideline is: “Don’t commit broken code”. Committing something that doesn’t even compile or that causes major parts of the application to fail will “make no sense” to others and may impede colleagues working on the project⁵. In a collaborative repository, pushing a commit that breaks the build or fails tests can block integration pipelines and annoy team members. Before committing, run the test suite. Ensure that all tests pass (or at least those not intentionally expected to fail). This discipline might require running a quick unit test command or a full CI pipeline locally. As one set of best practices puts it: “Resist the temptation to commit something that you think is completed – test it thoroughly to make sure it really is completed and has no side effects.” ⁴. By testing your code before you commit, you validate that the code in that commit is in a good state.

Maintain bisect-friendly history. If every commit builds and tests green, git bisect can be a powerful ally for tracking down issues. When you introduce a bug down the line, git bisect will check out commits one-by-one to find where things went wrong. If your commits are small and each is stable, bisect will cleanly pinpoint the first bad commit. However, if some commits are unstable (say, an intermediate commit that “half-implements” a feature and causes a test to fail), git bisect could be led astray. Unintentional test failures act like false signals during bisection⁶. For example, a commit that only adds a failing test (for a bug not yet fixed) introduces a red test in history; later the test passes when the fix is committed. From the perspective of bisect, the test going red in that commit might be mistaken for a regression, even though it was intentional. Because bisect assumes at most one transition from “good” to “bad”, a deliberately introduced failure can confuse it⁶. The safest practice is to avoid committing new failing tests or broken code in the main history altogether.

Intentional breaking commits (rarely) and how to handle them. In general, every commit to a shared branch should keep the build green. There are rare cases, often in test-driven development (TDD) or complex feature rollouts, where one might commit a known failing test or temporarily break something with the intention to fix it in the next commit. For example, when fixing a bug, a developer might first add a test that exposes the bug (which fails), and then in the next commit implement the fix so the test passes. This two-step approach can make the reasoning clear: the first commit shows the problem, the second shows the solution. Important: Such patterns should be used with care. Many experts advise not to push the failing test commit to the main branch until it’s fixed⁵. In a team setting or on the main integration branch, it’s better to combine the test and fix in one atomic commit, or at least ensure the failing test is flagged (skipped) so it doesn’t break the build. If you do use a separate commit for a failing test (on a topic branch, for example), ensure it’s clearly intentional. Some teams mark the commit message with a tag like “[WIP]” or use a convention to indicate that the commit is part of an incomplete sequence. In summary, breaking the build or tests should never be an accident – only do it when absolutely necessary, and even then, communicate that intent clearly (e.g. via commit message or branch strategy).

Independent and Revertible Commits

Another key characteristic of a well-scoped commit is that it can be cleanly reverted or cherry-picked without entangling unrelated code. This comes naturally when commits are atomic (one change at a time) and each commit stands on its own. An independent commit means it does not secretly rely on subsequent commits to function. In other words, it works in the context of the codebase as of that commit. This independence also implies a degree of reorderability: if you have two or three separate atomic commits, the project state should not fundamentally break if their order is swapped or if one is omitted – assuming they don’t have direct dependencies on each other. (Of course, some sequences do build on each other; the goal is to minimize unnecessary coupling.)

Ensure commits can be undone singly. A good test of a commit’s isolation is to ask: if this commit were reverted later, would the codebase still make sense and build correctly? An atomic commit “should be able to be reverted (via git revert) and not cause any side effects or conflicts in other parts of the system”⁷. When a commit contains only one self-contained change, reverting it will cleanly remove that change. On the other hand, if a commit mixes multiple concerns or partial work, a revert might remove some needed pieces or conflict with later changes. Community wisdom emphasizes this: “An 'atomic' commit is easier to handle in case you want to revert, cherry-pick or merge it. The changes of the commit are clear and understandable.”⁸. For example, consider a commit that both renames a database column and changes the logic that uses that column. If a problem is later found with the logic change, you cannot revert that commit without also undoing the rename – they’re entangled. Two independent commits (one for the rename, one for the logic change) would have been revertible in isolation. Keeping changes orthogonal in this way provides a safety net: any single commit can be undone with minimal impact on the rest of the code.

Cherry-pick and reorder with confidence. Independent commits also allow you to reuse changes easily in other contexts. For instance, if you develop a feature on a branch that consists of five small commits, and later you realize one of those commits is actually a useful bugfix that should go into the release branch, you can cherry-pick that one commit onto the release branch. This will be painless if that commit doesn’t depend on the other four commits from the feature. Teams often find themselves needing to rearrange the order of commits (during rebase) or drop a commit from a series; atomic commits make this process straightforward. The Git workflows documentation notes that commits should work independently of later commits ¹. That implies that if you stop the history at any given commit, everything up to that point works – and likewise, you could potentially reorder some commits without breaking things. While not every sequence of commits is reorderable (sometimes A must come before B), following the practice of one-per-change and keeping them buildable maximizes flexibility. In patch-driven projects like the Linux kernel, it’s acceptable for one patch to depend on another, but it should be noted in the description and each patch still must be justifiable on its own². The takeaway is to minimize hidden interdependencies between commits. If commit X is meant to prepare for commit Y, ensure X by itself doesn’t put the system in a faulty state; it should lay groundwork harmlessly, and then Y builds on it. This way, maintainers could reorder or drop commits during integration if needed (for example, if one commit in a series isn’t ready, they might defer it and take the others).

Small commits simplify merges. When everyone commits in small, self-contained chunks, integrating changes via merges becomes easier as well. If a merge conflict occurs, it’s often easier to resolve when the commits involved are narrow in scope. Moreover, the chances of two developers editing the exact same lines or files in the same commit are lower if each commit is focused. In case of conflict, understanding what each side was doing (from the commit messages and diffs) is clearer with atomic commits. Finally, debugging issues is easier: if a bug is introduced, you can often pinpoint it to a single small commit. If needed, you can revert that one commit to fix the bug, rather than backing out a large grab-bag commit that also contained unrelated changes.

Tested and Documented Commits

A commit doesn’t just represent a code change; it represents a change in the state of the project. Therefore, any ancillary updates required by that change should be included in the same commit whenever practical. This ensures that the repository remains consistent and that anyone looking at that commit sees the full context of the change.

Update tests along with code. If your change is supposed to be covered by tests – for example, you fix a bug or add a new feature – consider adding or updating the tests in the same commit. An oft-quoted rule for atomic commits is that all necessary pieces go together. “If you add a new feature, the same commit should ideally also add automatic tests for the feature to ensure it won’t regress”⁹. The logic here is straightforward: the feature and its tests are one logical unit of work. They either both go in, or neither does. Shipping code without its tests in the same commit could mean that, for a brief period in history, the code is untested (or incorrectly tested), or that tests in a later commit might be testing behavior introduced in an earlier commit (making that earlier commit not fully verifiable on its own). By including tests in the commit that introduces the functionality, you guarantee that anyone checking out that commit can run the test suite and see tests passing, including the new ones. It also documents the intended behavior of the code at the moment it’s introduced. For example, if you fix a bug in function parseData(), add a test case in that commit demonstrating the bug is fixed – so future readers of the history can clearly see the before/after effect via that test.

Keep documentation in sync. Documentation and configuration files should likewise be kept up-to-date with the code changes. If your commit changes how an API works, update the README, user guide, or code comments in that same commit to reflect the new behavior. If you’re adding a new feature that users should know about, it might warrant an entry in the CHANGELOG or release notes – don’t postpone that to some future commit. By updating docs as part of the change, you reduce the chance of forgetting to do it later. In the spirit of atomic commits: everything related to that change goes together. One common practice in many projects (including Angular and others) is to have a commit type for documentation changes (often docs:). This makes it clear that the commit affects documentation. For instance, a commit message might start with docs: update README with new configuration options. But importantly, if the documentation update is tied to a code change (e.g., adding a new config option), it can be part of the same commit that introduces that option, or a directly subsequent commit labeled as docs. In either case, it should be done before the change is considered “complete.” A good guideline is: if someone checked out your commit, would they have all they need to understand and use the change? If not, consider what’s missing (tests, docs, example config, etc.) and include it.

Ancillary files and metadata. Don’t overlook other repository files that might need updates. For example, if you add a new contributor in a project that tracks authors in a CONTRIBUTORS file, update that in the commit where appropriate. If you make a change that affects the build or deployment, update any configuration or build scripts as needed. In a documentation-driven project, if you change a function signature, update the relevant docs or even generated docs if they are version-controlled. All these practices ensure consistency. The project should not be in a state where the code says one thing but the README says another (at least not within a single commit’s snapshot of the repo). By keeping commits self-contained in this way, you again make it easier to revert or cherry-pick changes: you won’t accidentally revert code and leave stale documentation behind, or cherry-pick a feature without its necessary docs.

Example: Suppose you remove a deprecated command-line option from a CLI tool in a commit. A well-scoped commit for this would delete the code handling that option, update the help text or README to no longer mention it, and perhaps adjust any reference config files or tests that used it – all in one commit titled something like feat(cli): remove deprecated "verbose" option (BREAKING CHANGE). This commit fully encapsulates the removal of the feature. If later someone needs to revert this removal, the revert commit will bring back not just the code but also the relevant documentation and tests, keeping everything consistent.

Include CHANGELOG entries when relevant. Some repositories maintain a CHANGELOG.md manually. If your project does this, consider adding an entry in the changelog as part of the commit that introduces a notable change (especially for user-facing features or fixes). This way, you tie the changelog update to the change itself atomically. However, many modern workflows use automated generation of changelogs from commit messages (e.g., Conventional Commits with tools that aggregate feat and fix commits). If that’s the case, ensuring your commit message is descriptive and follows the convention is how you “update” the changelog. For example, Angular’s contributing guide and the Conventional Commits specification define commit message types like feat (feature), fix, docs, etc., and a special marker for breaking changes ¹⁰. Following such a convention helps downstream tools pick up your commit for release notes. For instance, if your commit introduces a breaking API change, including BREAKING CHANGE: in the commit message footer is an established practice¹⁰ – it signals to maintainers and automated tools that this commit is meant to introduce a deliberate breaking change in functionality.

Clear Commit Messages

(While this article focuses on commit content, it’s worth noting that good commit practices go hand-in-hand with good commit messages. A few guidelines are mentioned here for completeness.)

Always accompany your well-scoped commit with a clear, descriptive commit message. Many organizations have style guides for this. For example, the Angular project’s commit message format (which influenced the Conventional Commits standard) requires a structured message like <type>(<scope>): <short summary> ¹¹. Even if you don’t follow a specific template, follow general best practices for messages: use a short summary line (50 characters is a common recommendation) describing what the commit does, and a longer body if needed to explain why and how. Use the imperative mood (“Fix bug” not “Fixed bug”), as if giving an order to the codebase³. This is consistent with messages generated by Git for merges or reverts and is widely adopted. For example, instead of writing “I added a check for null inputs”, write “Add check for null inputs”.

If your commit is one in a series, it can be helpful to mention relationships (e.g., “Part 1 of 3” or “Prerequisite for X feature”) in the body. And if the commit introduces a breaking change or deprecates something, explicitly call that out – some conventions use BREAKING CHANGE: in the message body to flag this¹⁰. This practice ensures that when scanning history, such commits stand out. It also ties back to the idea of intentional breaks in the test or API: by noting it in the message, you signal to everyone that the breakage is acknowledged, not accidental.

Example of a well-formatted commit message:

feat(api)!: remove deprecated endpoints

Remove the deprecated v1 API endpoints `GET /users` and `POST /submit`.
This commit deletes the code handling these endpoints and updates the API documentation and tests accordingly.

BREAKING CHANGE: Clients using the removed endpoints will receive HTTP 404 errors. They must migrate to the v2 endpoints introduced in version 2.0.

In this example, the commit title follows a convention (feat type with a ! to indicate a breaking change). It clearly states what the commit does. The body explains details – what was changed and why. It also notes that documentation and tests were updated (showing the commit is comprehensive), and explicitly calls out the breaking change for downstream users. Anyone reviewing the history can immediately grasp the impact of this commit.

Conclusion

Writing small, self-contained commits is a discipline that pays dividends in team productivity, code quality, and project longevity. By adhering to the conventions outlined above – one logical change per commit, always leaving the code in a working state, and including all relevant updates – you create a Git history that tells a coherent story. Such a history is easy to navigate, making debugging and code archaeology far less painful. It also facilitates smoother code reviews and collaboration, since each commit is focused and can be discussed in isolation.

These best practices are reflected in the guidelines of some of the most respected software communities. The Linux kernel, for instance, requires patches to be logically separated and bisect-friendly², and projects like Angular enforce structured commit messages for clarity and automated tooling¹¹. Tools and specs (Git itself, Conventional Commits, etc.) have evolved to encourage this style of work because it leads to more maintainable software. As a developer, making a habit of crafting atomic commits with clear messages is a mark of professionalism and care for your craft.

In summary, commit often, commit intentionally, and commit completely. Each commit should be an island of functionality: small but whole. By following these conventions, you ensure that any commit in your project’s history can be understood, built, tested, and if necessary, reverted or reused with confidence. This fosters a robust development workflow where changes are tracked and communicated effectively through version control. As the proverb goes, “take care of the pennies and the pounds will take care of themselves” – in Git terms, take care of the commits, and the codebase will take care of itself.

TL;DR

Make incremental changes in each commit, keeping them small: Write the smallest amount of code necessary to implement a single change or fix. This approach makes each update easy to test and roll back without affecting unrelated functionality¹². Small commits also reduce the likelihood of merge conflicts by minimizing overlap with others’ work ¹².
Limit each commit to one logical change or issue: Do not bundle unrelated fixes or features in the same commit. For example, if you fix a bug and improve performance, use separate commits for each; a commit should serve as a wrapper for related changes only ²,⁴. Keeping commits focused ensures they are easier to understand, review, and revert if something goes wrong⁴.
Keep commits atomic and self-contained: Ensure every commit is a single, self-sufficient unit of work (one task or fix) that can be applied or reverted in isolation without side effects¹². An atomic commit encapsulates a complete thought – for instance, a code refactoring and a new feature should be in separate commits – which makes code reviews faster and revert operations safer¹². Each commit should be justifiable on its own merits and not require later commits to be understandable².
Avoid committing half-done work: Only commit when a piece of functionality or bug fix is fully implemented and tested. Incomplete code (such as a partial feature that doesn’t yet work) does not belong in the main commit history ⁴. If you need to checkpoint work or switch context, use a draft/WIP branch or Git’s stash feature rather than committing unfinished changes⁴. This ensures that every commit in the shared history represents a coherent, finished change.
Commit early and commit often: Frequent commits prevent individual commits from growing too large. By breaking development into quick, logical chunks, you make it easier for teammates to integrate changes regularly and avoid large merges ⁴. Short-lived branches with regular small commits minimize divergence from the main branch, reducing integration problems and easing collaboration¹².
Ensure each commit builds and passes tests: Treat the repository as if it should be in a working state at every commit. Compile the code and run the test suite for the project after each commit to verify nothing is broken². This guarantees that any commit can be checked out on its own and will function correctly, which is crucial for tools like git bisect and for safely reverting commits in isolation².
Test and review before committing: Resist the urge to commit code you think is complete—run the code and tests to be sure. Verifying the change in isolation (including checking for side effects) is important for quality ⁴. Additionally, self-review the diff before you commit to confirm that only the intended changes are included and that you haven’t introduced debugging statements or unrelated edits. This extra scrutiny helps maintain focus and catch mistakes early¹³.
Use interactive staging to craft focused commits: Leverage Git’s staging tools (for example, git add -p or git add --interactive) to stage changes selectively. This allows you to split a mixed set of edits into separate commits for each concern¹³. By carefully choosing hunks of code to include, you ensure each commit contains only relevant changes, which reinforces a clean separation of ideas and makes the history easier to understand.
Separate formatting or whitespace changes from functional changes: If you need to reformat code, fix indentation, or make other non-functional style tweaks, do so in a dedicated commit without mixing in code logic changes¹⁴. Isolating purely cosmetic changes prevents noise in the diff and lets reviewers concentrate on the substantive modifications. In practice, such a commit should contain no semantic code changes, making it clear that its purpose is only to improve readability or adhere to style guidelines¹⁴.
Isolate large-scale code movements or renames in their own commits: When moving code between files or renaming classes/functions, perform the move in one commit and make no other changes in that same commit². This clear separation means that the move/rename commit purely relocates code. The subsequent commit can then modify that code if needed. Keeping moves separate greatly aids reviewers (who can verify no logic changed during the move) and preserves file history for tools like git blame².
Use topic branches to organize development: Develop new features or fixes on separate branches rather than on the main branch. Branching isolates your work until it’s ready, which avoids polluting the main history with WIP commits and allows for code review before integration¹². Once the work is complete and each commit is polished, merge the branch back (or rebase onto main) so that the main line of development remains stable and linear. This practice also makes it easier to revert or cherry-pick a set of changes if they were developed in an isolated branch.
Establish a consistent team workflow and commit policy: As a team, agree on how you use Git – for example, whether you squash commits on merge, if you prefer rebase over merge commits, how you name branches, and the expected commit message format. A defined workflow (such as GitHub Flow or Git Flow) and shared conventions ensure that everyone works in a compatible way⁴. Consistency in commit practices across the team promotes clarity and makes the project history more predictable and maintainable.
Adopt a standard commit message format: Use an established convention for commit messages to provide structure and meaning. For instance, many projects follow the Conventional Commits or Angular format, where each message begins with a type like feat:, fix:, docs:, etc., optionally followed by a scope, and a brief description¹⁰. Having a consistent format makes the history easy to parse and can enable automated changelog generation or semantic versioning tools.
Begin each commit message with a concise, descriptive summary: The first line of the commit message (the “subject”) should quickly convey what the commit does. Write it in imperative mood (as if giving an order, e.g. “Add user login audit log” not “Added” or “Adding”) and in the present tense¹². Keep this summary line short (around 50 characters or less) and capitalize the first word. A good summary allows others scanning the commit log to immediately grasp the intent of the change.
Provide explanatory detail in the commit message body when necessary: If the reason or context for the change isn’t obvious from the code diff and summary alone, include a body in the commit message after a blank line below the summary. In this body, explain why the change was made and any background info or implications, not just what was done⁴. Wrap the text at roughly 72 characters per line for readability⁴. A well-written body can describe the problem being solved or the decision behind the implementation, helping future maintainers understand the commit without needing external references¹².
Reference relevant issues, tickets, or external links in the footer: When a commit relates to a discussion, bug report, or feature request, mention that in the commit message. For example, include “Fixes #123” or “Refs: issue-456” in the footer to automatically link the commit to issue trackers¹⁰. This practice provides traceability – anyone reading the commit can discover the broader context or see that an issue was resolved by that commit. It also helps project management by closing issues when commits are merged, if the repository host supports it.
Maintain consistency and clarity in commit messages: Ensure each commit message adheres to the agreed style and contains enough information to stand on its own. Describe your changes in a way that other developers (or your future self) can understand the history without guesswork. For example, rather than just saying “Update code” or having an empty message, always supply meaningful detail. Consistent, well-structured messages across the project make the version history a valuable narrative of the project’s evolution ¹².
Leverage tooling to enforce commit quality: Use automated checks and hooks to uphold your commit standards. A commit-msg hook can verify that commit messages meet your formatting rules (for example, rejecting messages that don’t follow the Conventional Commits pattern), and a pre-commit hook can run linters or tests to prevent code that fails checks from being committed. Many teams integrate these into their workflow so that commits that don’t meet the project’s guidelines are flagged or rejected before they reach the main repository. Automation in this way helps sustain high quality and consistency in a collaborative environment.
Strive for a clean, bisectable history: Each commit in the history should be meaningful and not merely a “fixup” for the previous one. Try to correct small mistakes (typos, minor bugs introduced in the last commit, etc.) by amending the prior commit or via an interactive rebase before pushing, rather than adding a new “fix typo” commit. The goal is a tidy commit log where any commit can be understood on its own and, if needed, reverted without having to also revert subsequent “patch” commits. This discipline makes tools like git bisect more effective and the project history more professional and maintainable.

References

- Git Workflows (git-scm official documentation) – advice on separating changes into independent, test-passing commits (Git - gitworkflows Documentation)
↩︎ ↩︎ ↩︎ ↩︎ ↩︎
- Linux Kernel Contribution Guidelines – “Submitting Patches” (requiring one logical change per patch and maintaining build stability) (Submitting patches: the essential guide to getting your code into the kernel — The Linux Kernel documentation)
↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎
- Bareos Project Developer Guide – Git Workflow (emphasizing related changes per commit and ease of rollback) (Git Workflow — Bareos Documentation documentation)
↩︎ ↩︎ ↩︎
- GitHub Gist – “Git Commit Best Practices” (recommendations to commit related changes, commit often, and write good messages) (Git Tips and Git Commit Best Practices · GitHub)
↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎
- Stack Exchange Q&A – consensus against committing incomplete or non-compiling work (no half-done features or failing tests in shared history) (TDD and Version Control - Software Engineering Stack Exchange)
↩︎ ↩︎
Hacker News – community discussion about using failing tests for git bisect (When fixing a bug, add a failing test first, as a separate commit | Hacker News) ↩︎ ↩︎
- F. Brack, “One Commit. One Change.” – discussion of atomic commits (revertible without side effects, not breaking the build) (One Commit. One Change. | by Fayner Brack | Medium)
↩︎
Stack Overflow – discussion on why small, atomic Git commits are easier to understand, revert, and cherry-pick (Why I need small Git commit every time? - Stack Overflow) ↩︎
- Optimized by Otto Blog – “Five requirements for a good git commit” (atomic, with tests and docs included) (How to make a good git commit)
↩︎
- Conventional Commits v1.0.0 – specification for structured commit messages (types like feat/fix/docs, and marking breaking changes) (Conventional Commits)
↩︎ ↩︎ ↩︎ ↩︎ ↩︎
Angular Project – Contributing Guidelines (commit message formatting rules that inspired Conventional Commits) (angular/CONTRIBUTING.md at main · angular/angular · GitHub) ↩︎ ↩︎
GitLab – “What are Git version control best practices?” (advice on making incremental, small changes) (GitLab Blog) ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎
DEV Community – “Make Small Commits” by oculus42 (tips on using IDE tools to manage focused commits) (Make Small Commits - DEV Community) ↩︎ ↩︎
Stack Overflow – Committing when changing source formatting? (discussion on separating cosmetic from functional changes) (version control - Committing when changing source formatting? - Stack Overflow) ↩︎ ↩︎

Spacemacs Ultimate Cheatsheet

A cheatsheet I wrote for all the Spacemacs hotkeys worth knowing.

/ [pdf]

Download Cheatsheet

Latex PDF from Markdown

I write a lot of technical documents, and it is important to me that they look nice. In the past I have always resorted to using latex directly. It's an ugly language, hard to maintain, but it has always been the best solution. Microsoft Word completely fails at complex documents, and is tedious and buggy if you want to do anything technical with it. If I have a simpler document without much math or special formatting I've always used Markdown.

Recently however I was able to get the best of both worlds. Using R-script's flavored Markdown it adds a tremendous amount of power to Markdown. The most important feature is the fact that it uses pandoc in order to compile your Markdown text into a latex template which is then used to compile a PDF. This means I get all the power of latex with all the simplicity of Markdown. That alone made it a winner for me, but it also allows you to compile into HTML and a number of other formats which can make it even more useful.

It also has one other very sweet added bonus if you don't mind coding in R-Script: you can put R-script code directly into your markdown in order to produce charts or other output programmatically. R-script is a heavily math oriented programming language so it can be extremely powerful when writing math papers.

Getting Started

If you want to get started using R-flavored Markdown to compile into latex and PDF I have provided my templates on GitHub. I also have another GitHub repository where I provide a complete example of a markdown document and makefile that will compile it into a PDF. Really simple to figure it out, just clone the repository and start editing it. Make sure, of course, that you have R-script installed and all the prerequisites you might need. Finally I also have a third GitHub repository with even more templates and examples. Some of the examples go in depth on showing how you can use r-script inside the Markdown to generate plots and graphs or other data. Worth a look there are templates for CVs/resumes, articles, powerpoint presentations, syllabus, etc.

R-flavored Markdown files use the extension "Rmd" and use all the conventional Markdown syntax with some additions. All you need to do is add some front-matter to the begining of the Markdown file which provides some hints to the latex template to do some formatting. The front-matter looks like the following, taken directly from the example I linked. The rest of the content of the file is just the Markdown, very simple. You can see the full file with front-matter and Markdown as used in this example by clicking the link.

output:
  pdf_document:
    citation_package: natbib
    keep_tex: true
    fig_caption: true
    latex_engine: pdflatex
    template: templates/syncleus-white.tex

# Required properties
title: "Conditional Probabilities and Bayes Theorem"
abstract: "A simple explanation of conditional probabilities."
author:
  name: Jeffrey Phillips Freeman
  affiliation: Syncleus
  correspondence: jeffrey.freeman@syncleus.com

# Optional Properties
other-authors:
- name: John Doe
  affiliation: NoTech
tags: Statistics, Math
start-date: "April 6, 2017"
date: "`r format(Sys.time(), '%B %d, %Y')`"
draft: "`r format(Sys.time(), '%B %d, %Y')`"
revision: 1.1
tocdepth: 5
title-color: 0,0,90
# No indentation
indent: 0pt
parskip: 3mm
# Standard Anglo-american indentation
#indent: 15pt
#parskip: 0mm
archive: Pseudoscience Journal of America
geometry: left=2cm, right=2cm, top=2.25cm, bottom=2.25cm, headheight=13.6pt, letterpaper
bibliography: ./probabilities.bib
biblio-style: plainnat
biblio-title: References

Below you will see the output of the compiled pdf using this example, different temples can give different layouts and formatting.

/ [pdf]

In the above examples you can see how the date properties are generated dynamically as the current date using R-script. This gives you just a small taste of how you can use R-script directly inside the Markdown files. Most of the properties above are self explanatory but let me provide a few descriptions for clarity all the same.

title - The title of the document.
abstract - A few sentences describing the document used int he abstract box once rendered.
author - Information about the author
other-authors - Optional supplemental authors.
tags - optional list of keywords.
start-date - Date the author began working on the paper. This is used int he copyright information.
date - Date the paper was released, this is used int he copyright information.
draft - Optional text which will appear in a Draft watermark on the paper. If this property is left out then no draft watermark will be added.
revision - The revision version number of the paper displayed int he upper right corner.
tocdepth - The maximum depth displayed int he table of contents for nested sections.
title-color - 3 comma separated integers from 0 to 255 representing the RGB color of the title text.
indent - The size of the paragraph indentation.
parskip - The vertical distance between paragraphs.
archive - The name of the journal or archive the paper is to be published in.
geometry - Optional parameters that overrides the latex geometry, useful for setting custom margins.
bibliography - location of the natbib or bibtext bibliography file.
biblio-style - Optional override for the bibliography style.
biblio-title - The title used for the bibliography section.

Thats really all there is to it. One last thing, I made sure the properties I picked for my template match those expected in a middleman blog written in markdown. This adds some convenience that the same source file can be compiled by either Middleman or R-script.

Hyperassociative Map Explanation

Introduction

Almost 8 years ago, on Aug 15, 2009, I invented a new game-changing algorithm called the Hyperassociative Map algorithm. It was released as part of the dANN v2.x library. The HAM algorithm, as it is often called, has since been used by countless developers and in hundreds of projects. HAM is a Graph Drawing algorithm that is similar to force-directed algorithms but in a class all its own. Unlike other force-directed algorithms HAM does not have a sense of momentum or acceleration which makes it debatable if it can even still be called force-directed.

Below is a video demonstration of HAM in action. In this 3D visualization the vertices of the graph are displayed as grey spheres but the edges are not rendered. The graph's topology is relatively simple containing 128 nodes in groups of 16 layered such that each group is fully connected to each adjacent group. This results in 256 edges between each adjacent group. Since the groups on either end only have one other group they are adjacent to that means there is a total of 1,792 edges. Despite this the graph aligns quickly and smoothly on a single 1 Ghz processor as demonstrated in the video. It starts with randomized locations for each vertex and then aligns. After each alignment the graph is reset with new random starting positions to show that the same alignment is achieved every time.

What makes HAM so special is that it retains many of the advantages that have made force-directed algorithms so popular while simultaneously addressing their short comings. Wikipedia describes the following advantages to using force-directed algorithms, all of which hold true for the HAM algorithm.

Good-quality results - The output obtained usually have very good results based on the following criteria: uniform edge length, uniform vertex distribution and showing symmetry. This last criterion is among the most important ones and is hard to achieve with any other type of algorithm.
Flexibility - Force-directed algorithms can be easily adapted and extended to fulfill additional aesthetic criteria. This makes them the most versatile class of graph drawing algorithms. Examples of existing extensions include the ones for directed graphs, 3D graph drawing, cluster graph drawing, constrained graph drawing, and dynamic graph drawing.
Intuitive - Since they are based on physical analogies of common objects, like springs, the behavior of the algorithms is relatively easy to predict and understand. This is not the case with other types of graph-drawing algorithms.
Simplicity - Typical force-directed algorithms are simple and can be implemented in a few lines of code. Other classes of graph-drawing algorithms, like the ones for orthogonal layouts, are usually much more involved.
Interactivity - Another advantage of this class of algorithm is the interactive aspect. By drawing the intermediate stages of the graph, the user can follow how the graph evolves, seeing it unfold from a tangled mess into a good-looking configuration. In some interactive graph drawing tools, the user can pull one or more nodes out of their equilibrium state and watch them migrate back into position. This makes them a preferred choice for dynamic and online graph-drawing systems.
Strong theoretical foundations - While simple ad-hoc force-directed algorithms often appear in the literature and in practice (because they are relatively easy to understand), more reasoned approaches are starting to gain traction. Statisticians have been solving similar problems in multidimensional scaling (MDS) since the 1930s, and physicists also have a long history of working with related n-body problems - so extremely mature approaches exist. As an example, the stress majorization approach to metric MDS can be applied to graph drawing as described above. This has been proven to converge monotonically. Monotonic convergence, the property that the algorithm will at each iteration decrease the stress or cost of the layout, is important because it guarantees that the layout will eventually reach a local minimum and stop. Damping schedules cause the algorithm to stop, but cannot guarantee that a true local minimum is reached.

However the two disadvantages described of force-directed algorithms, namely high running time and poor local minima, have been corrected in the HAM algorithm. As described earlier HAM is not a true force-directed algorithm because it lacks any sense of momentum. This was intentional as it ensures there is no need for a dampening schedule to eliminate oscillations that arise from the momentum of nodes. This has the added advantage that the algorithm does not prematurely come to rest at a local minima. It also means fewer processing cycles wasted on modeling oscillations and vibrations throughout the network.

These properties alone already make HAM a worthwhile algorithm for general study and real-world applications, however it is important to note that HAM was originally designed with a very specific use case in mind. Originally HAM was designed to facilitate the distribution of massive real-time graph processing networks. The sort of scenario where each vertex in a graph had to process some input data and produce some output data and where each vertex is part of a large interdependent graph working on the data in real time. When distributing the tasks across a cluster of computers it is critical that vertices that are highly interconnected reside on the same computer in the cluster and physically close to the computers housing the vertices that will ultimately receive the data, process it and then carry it throughout the rest of the network. For this purpose HAM was created to model graphs such that each node in a compute cluster took ownership of tasks associated with vertices that were spatially close to each other according to the HAM's drawing of the compute graph.

In order for HAM to be successful at it's job it needed to exhibit a few very specific properties. For starters the interactivity property mentioned earlier was a must. HAM needed to be able to work with a graph that is constantly changing its topology with new vertices able to be added, removed, or reconfigured in real time. This is ultimately what led the algorithm to be modeled in a way that made it similar to force-directed algorithms.

The other requirement is that the motion of the vertices had to be smooth without any oscillations as they align. This was critical because if oscillations occurred on a vertex as it was near the border that distinguishes one compute node from another then those oscillations across that border would cause the task in the compute cluster to be transferred between the nodes in the cluster each time. Since this is an expensive operation it is important that as HAM aligned the vertices didn't jitter causing them to cross these borders excessively.

Finally HAM needed to be able to be parallelized and segmented. That means that it needed to scale well for multi-threading but in such a way that each thread didn't need to be aware of the entire graph in order to process it; instead each thread had to be capable of computing the alignment of HAM on an isolated section of the graph. This is obviously critical because of the distributed nature of the compute graph, particularly if we want something capable of unbounded scaling. I basically wanted an algorithm that could be successful on even massively large graphs.

With almost 8 years of testing it has become evident that HAM is top in its class compared to many graph drawing algorithms. Despite this it is still scarcely understood by those studying graph drawing algorithms. For this reason I wanted to write this article to share some of its internal workings so others can adapt and play with the algorithm for their own projects.

The Algorithm

In this section I want to get into the internal workings of the Hyperassociative Map algorithm, HAM. Below is the pseudocode breakdown explaining the algorithm. Notice I use some math notation here for simplicity. Most notably I use vector notation where all variables representing vectors have a small arrow above the variable and the norm, or magnitude, of the vector is represented by double vertical bars on either side, for example $||\vec{p}||$. If you have trouble with vector notation or just want to see a concrete example the full working java code can be found at the end of this article for reference.

\begin{algorithm}
\caption{Hyperassociative Map}
\begin{algorithmic}
% equilibrium distance
\REQUIRE $\tilde{\chi} > 0$
%repulsion strength
\REQUIRE $\delta > 1$
% learning rate
\REQUIRE $\eta = 0.05$
% alignment threshold (determines when graph is aligned)
\REQUIRE $\beta =$ 0.005
\PROCEDURE{HAM}{Vertex Set \textbf{as} $g$}
  \STATE \CALL{Randomize}{$g$}
  \WHILE{\CALL{AlignAll}{$g$} $> \beta \cdot \tilde{\chi}$}
    \STATE optionally recenter the graph
  \ENDWHILE
\ENDPROCEDURE
\PROCEDURE{AlignAll}{Vertex Set \textbf{as} $g$}
  \STATE $\zeta = 0$
  \FORALL{$v$ \textbf{in} $g$}
    \STATE $\vec{{\scriptsize \triangle} p}  =$ \CALL{Align}{$v$}
    \IF{$||\vec{{\scriptsize \triangle} p}|| > \zeta$}
      \STATE $\zeta = ||\vec{{\scriptsize \triangle} p}||$
    \ENDIF
    \STATE \CALL{Place}{$v$, \CALL{Position}{$v$} $+ \vec{{\scriptsize \triangle} p}$}
  \ENDFOR
  \RETURN $\zeta$
\ENDPROCEDURE
\PROCEDURE{Align}{Vertex \textbf{as} $v$}
  \STATE $\vec{p} =$ \CALL{Position}{$v$}
  \STATE $\vec{{\scriptsize \triangle} p} = 0$
  \FORALL{$m$ \textbf{in} \CALL{Neighbors}{$v$}}
    \STATE $\vec{q} =$ \CALL{Position}{$m$} - $\vec{p}$
    \STATE $\vec{{\scriptsize \triangle} p} = \vec{{\scriptsize \triangle} p} + \vec{q} \cdot \frac{(||\vec{q}|| - \tilde{\chi}) \cdot \eta}{||\vec{q}||}$
  \ENDFOR
  \FORALL{$m$ \textbf{in} \CALL{NotNeighbors}{$v$}}
    \STATE $\vec{q} =$ \CALL{Position}{$m$} - $\vec{p}$
    \STATE $\vec{c} = \vec{q} \cdot \frac{-\eta}{{||\vec{q}||}^{\delta + 1}}$
    \IF{$||\vec{c}|| > \tilde{\chi}$}
      \STATE $\vec{c} = \vec{c} \cdot \frac{\tilde{\chi}}{||\vec{c}||}$
    \ENDIF
    \STATE $\vec{{\scriptsize \triangle} p} = \vec{{\scriptsize \triangle} p} + \vec{c}$
  \ENDFOR
  \RETURN $\vec{{\scriptsize \triangle} p}$
\ENDPROCEDURE
\PROCEDURE{Randomize}{Vertex Array \textbf{as} $g$}
  \STATE randomise position of all vertex in $g$
\ENDPROCEDURE
\PROCEDURE{Place}{Vertex \textbf{as} $v$, Vector \textbf{as} $\vec{p}$}
  \STATE sets the position of $v$ to $\vec{p}$
\ENDPROCEDURE
\PROCEDURE{Neighbors}{Vertex \textbf{as} $v$}
  \RETURN set of all vertex adjacent to $v$
\ENDPROCEDURE
\PROCEDURE{NotNeighbors}{Vertex \textbf{as} $v$}
  \STATE $s =$ set of all vertex not adjacent to $v$
  \STATE $w =$ set of all vertex whose position is close to that of $v$
  \RETURN $s \cap w$
\ENDPROCEDURE
\PROCEDURE{Position}{Vertex \textbf{as} $v$}
  \RETURN vector representing position of $v$
\ENDPROCEDURE
\end{algorithmic}
\end{algorithm}

Obviously the pseudocode packs a lot of information into only a few lines so I'll try to explain some of the more important parts so you have an idea at what you're looking at.

Constants

First, lets consider the constants defined at the beginning. the variable $\tilde{\chi}$ is called the Equilibrium Distance. It defines the ideal distance between two vertices connected by an edge. If a pair of vertices connected by a single edge are the only vertices present then they will align such that they are approximately as far apart as the value of $\tilde{\chi}$. For simplicity here we have represented $\tilde{\chi}$ as a single constant but in practice it is also possible to assign a different value to this constant for each edge, resulting in a graph with different aesthetic qualities. This value must of course always be a positive number greater than $0$. The default value, and the one used in the demonstration video, is $1$.

The second constant is called the Repulsion Strength and is represented by the $\delta$ variable. This constant determines how strong the repulsion between two unconnected vertices are, that is two vertices not connected by an edge. Lower values for $\delta$ result in a stronger repulsion and larger numbers represent a weaker repulsion. The default value is $2$ and this is the value used in the demonstration video.

Next is the Learning Rate constant, $\eta$. This is simply a scaling factor applied when the vertices are aligned to ensure the graph is aligned to each node with equivalent effect rather than over-fitting to the last node processed.

The last constant is the Alignment threshold, $\beta$, this represents the minimum movement threshold. Once the vertices move less than this value during an alignment cycle it is presumed the graph is sufficiently aligned and the loop ends.

Align Procedure

The algorithm itself is represented such that it is broken up into three major procedures. The procedure named HAM is the entry point for the algorithm, the procedure named Align calculates the incremental alignment for a single vertex, and the procedure named AlignAll calculates alignment once for every vertex in the graph.

Lets first explain what is going on in the Align procedure. Here we have a single value being passed in, the vertex to be aligned, $v$. On line 19 the current position of the vertex is obtained and represented as a euclidean vector, $\vec{p}$. Next on line 20 a zero vector is initialized and represented as $\vec{{\scriptsize \triangle} p}$. The vector $\vec{{\scriptsize \triangle} p}$ is calculated throughout the procedure and represents the desired change to the current position of the vector $\vec{p}$. In other words once $\vec{{\scriptsize \triangle} p}$ is calculated it can be added to the current position of the vertex and will result in the new position for the vertex. Just as if calculating forces the $\vec{{\scriptsize \triangle} p}$ will be the sum of all the composite influences acting on the vertex; so it represents the overall influence exerted on the vertex at any time.

When calculating $\vec{{\scriptsize \triangle} p}$ the procedure must iterate through all the other vertices that have an effect on the vertex being aligned. There are two type of vertices each with different effects: neighbors, and non-neighbors. Neighbors are all the vertices connected directly to the current vertex by an edge, non-neighbors are all the other vertices not connected by an edge.

First the influence from the neighbor vertices is calculated on lines 21 - 24. The influence two neighbor vertices have on each other is different depending on how far apart they are. If they are closer than the Equilibrium Distance, $\tilde{\chi}$, then the effect is repulsive. If they are farther apart than $\tilde{\chi}$ then the effect is attractive. The calculation for this is represented by line 23. It basically calculates the vector that represents the difference between the position of the vertex being aligned and its neighbor and reduces the magnitude of the vector back to $(||\vec{q}|| - \tilde{\chi}) \cdot \eta$. To look at it another way if the equation was just $||\vec{q}|| - \tilde{\chi}$ then the new position of the vector would be exactly at the Equilibrium Distance, but instead it is scaled to a fraction of this by $\eta$ which adjusts how quickly the vertex will approach its equilibrium point.

Next the influence between non-neighbor vertices is calculated on lines 25 - 32. Non-neighbor vertices, that is vertices not connected by an edge, always exhibit a purely repulsive influence. Line 27 calculates this in a similar technique as before. That is the difference between the position of the two vertices is represented by $\vec{q}$ and then its magnitude is scaled. Of course it's also negative to indicate that the force is repulsive. The equation just seems confusing in its simplified and compacted form. Initially it was derived by calculating the new magnitude of $\vec{q}$ as the following.

$$ \frac{-1}{{\mid\mid\vec{q}\mid\mid}^{\delta}} \cdot \eta $$

This makes a lot more sense as we know in nature repulsive forces are the inverse square of their distance. So this accurately represents a repulsive influence that diminishes with distance. Once we actually apply that magnitude to the vector and simplify we arrive at our final equation.

$$ \vec{q} \cdot \frac{\frac{-1}{{\mid\mid\vec{q}\mid\mid}^{\delta}} \cdot \eta}{\mid\mid\vec{q}\mid\mid} \Rightarrow \vec{q} \cdot \frac{-\eta}{{||\vec{q}||}^{\delta + 1}}$$

The only caveat to this is seen in lines 28 to 30 where it checks the distance moved as a result of the repulsive influence. If it is greater than the Equilibrium Distance, $\tilde{\chi}$, then its magnitude is scaled back to be $\tilde{\chi}$. This is done because at very close distances the exponential nature of the repulsive influence becomes overwhelming and we want to ensure the majority of this influence works at a distance to allow the graph to spread apart but still allow the other influences to be the dominate influences on the graph.

At this point the computed change in position for the vertex is simply returned at line 33 for further processing by the AlignAll procedure.

AlignAll Procedure

The AlignAll Procedure is extremely straight forward. It is passed in the set of all vertices in the graph as $g$ and iterates over the set while aligning them one at a time. Each vertex will get aligned once per call to the procedure, this means the procedure will usually need to be called multiple times.

On line 8 the Maximum Vertex Movement variable, represented as $\zeta$, is initialized to $0$. This variable represents the greatest distance any vertex moved during the alignment; after being calculated it's value is returned on line 15. The Maximum Vertex Movement is important for determining when the HAM algorithm has finished processing.

Other than that this procedure doesn't do anything special, the vertex alignment vector is calculated on line 10 and the new position for the vertex is set on line 14.

HAM Procedure

The HAM procedure is another rather straight forward procedure to explain. It starts by assigning some initial random coordinates to each vertex in the graph. After that it continually loops calling AlignAll until the graph is sufficiently aligned.

On line 3 the AlignAll procedure is called in a loop until the Max Vertex Movement returned is less than $\beta \cdot \tilde{\chi}$. This is just the Alignment Threshold normalized by the Equilibrium Distance constant. The Alignment Threshold is sufficiently small such that if the movements in the graph are less than this value then they are considered negligible and the alignment can end.

As an optional step after each alignment iteration it may be desired to translate the entire graph so it is centered around the zero vector. There is a small amount of drift as the alignment of the graph is calculated and by doing this it ensures the graph remains in the center of the view when rendered. The drift is usually negligible however so this step is entirely optional. In the full java example below the logic for centering the graph is included.

Appendix: Full Java Code

public class HyperassociativeMap<G extends Graph<N, ?>, N> implements 
        GraphDrawer<G, N> {
    private static final double REPULSIVE_WEAKNESS = 2.0;
    private static final double DEFAULT_LEARNING_RATE = 0.05;
    private static final double EQUILIBRIUM_DISTANCE = 1.0;
    private static final double EQUILIBRIUM_ALIGNMENT_FACTOR = 0.005;

    private final G graph;
    private final int dimensions;
    private Map<N, Vector> coordinates = Collections.synchronizedMap(new 
            HashMap<N, Vector>());
    private static final Random RANDOM = new Random();
    private final boolean useWeights;
    private double equilibriumDistance;
    private double learningRate = DEFAULT_LEARNING_RATE;
    private double maxMovement = 0.0;

    public HyperassociativeMap(final G graph, final int dimensions, final 
    double equilibriumDistance, final boolean useWeights) {
        if (graph == null)
            throw new IllegalArgumentException("Graph can not be null");
        if (dimensions <= 0)
            throw new IllegalArgumentException("dimensions must be 1 or more");

        this.graph = graph;
        this.dimensions = dimensions;
        this.equilibriumDistance = equilibriumDistance;
        this.useWeights = useWeights;

        // refresh all nodes
        for (final N node : this.graph.getNodes()) {
            this.coordinates.put(node, randomCoordinates(this.dimensions));
        }
    }

    @Override
    public G getGraph() {
        return graph;
    }

    public double getEquilibriumDistance() {
        return equilibriumDistance;
    }

    public void setEquilibriumDistance(final double equilibriumDistance) {
        this.equilibriumDistance = equilibriumDistance;
    }

    public void resetLearning() {
        maxMovement = 0.0;
    }

    @Override
    public void reset() {
        resetLearning();
        // randomize all nodes
        for (final N node : coordinates.keySet()) {
            coordinates.put(node, randomCoordinates(dimensions));
        }
    }

    @Override
    public boolean isAlignable() {
        return true;
    }

    @Override
    public boolean isAligned() {
        return isAlignable()
                && (maxMovement < (EQUILIBRIUM_ALIGNMENT_FACTOR * 
                equilibriumDistance))
                && (maxMovement > 0.0);
    }

    @Override
    public void align() {
        // refresh all nodes
        if (!coordinates.keySet().equals(graph.getNodes())) {
            final Map<N, Vector> newCoordinates = new HashMap<N, Vector>();
            for (final N node : graph.getNodes()) {
                if (coordinates.containsKey(node)) {
                    newCoordinates.put(node, coordinates.get(node));
                } else {
                    newCoordinates.put(node, randomCoordinates(dimensions));
                }
            }
            coordinates = Collections.synchronizedMap(newCoordinates);
        }

        maxMovement = 0.0;
        Vector center;

        center = processLocally();

        // divide each coordinate of the sum of all the points by the number of
        // nodes in order to calculate the average point, or center of all the
        // points
        for (int dimensionIndex = 1; dimensionIndex <= dimensions; 
             dimensionIndex++) {
            center = center.setCoordinate(center.getCoordinate
                    (dimensionIndex) / graph.getNodes().size(), dimensionIndex);
        }

        recenterNodes(center);
    }

    @Override
    public int getDimensions() {
        return dimensions;
    }

    @Override
    public Map<N, Vector> getCoordinates() {
        return Collections.unmodifiableMap(coordinates);
    }

    private void recenterNodes(final Vector center) {
        for (final N node : graph.getNodes()) {
            coordinates.put(node, coordinates.get(node).calculateRelativeTo
                    (center));
        }
    }

    public boolean isUsingWeights() {
        return useWeights;
    }

    Map<N, Double> getNeighbors(final N nodeToQuery) {
        final Map<N, Double> neighbors = new HashMap<N, Double>();
        for (final TraversableCloud<N> neighborEdge : graph.getAdjacentEdges
                (nodeToQuery)) {
            final Double currentWeight = (((neighborEdge instanceof Weighted)
                    && useWeights) ? ((Weighted) neighborEdge).getWeight() : 
                    equilibriumDistance);
            for (final N neighbor : neighborEdge.getNodes()) {
                if (!neighbor.equals(nodeToQuery)) {
                    neighbors.put(neighbor, currentWeight);
                }
            }
        }
        return neighbors;
    }

    private Vector align(final N nodeToAlign) {
        // calculate equilibrium with neighbors
        final Vector location = coordinates.get(nodeToAlign);
        final Map<N, Double> neighbors = getNeighbors(nodeToAlign);

        Vector compositeVector = new Vector(location.getDimensions());
        // align with neighbours
        for (final Entry<N, Double> neighborEntry : neighbors.entrySet()) {
            final N neighbor = neighborEntry.getKey();
            final double associationEquilibriumDistance = neighborEntry
                    .getValue();

            Vector neighborVector = coordinates.get(neighbor)
                    .calculateRelativeTo(location);
            double newDistance = Math.abs(neighborVector.getDistance()) - 
                    associationEquilibriumDistance;
            newDistance *= learningRate;
            neighborVector = neighborVector.setDistance(newDistance);
            compositeVector = compositeVector.add(neighborVector);
        }
        // calculate repulsion with all non-neighbors
        for (final N node : graph.getNodes()) {
            if ((!neighbors.containsKey(node)) && (node != nodeToAlign)
                    && (!graph.getAdjacentNodes(node).contains(nodeToAlign))) {
                Vector nodeVector = coordinates.get(node).calculateRelativeTo
                        (location);
                double newDistance = -1.0 / Math.pow
                        (nodeVector.getDistance(), REPULSIVE_WEAKNESS);
                if (Math.abs(newDistance) > Math.abs(equilibriumDistance)) {
                    newDistance = Math.copySign(equilibriumDistance, 
                            newDistance);
                }
                newDistance *= learningRate;
                nodeVector = nodeVector.setDistance(newDistance);
                compositeVector = compositeVector.add(nodeVector);
            }
        }
        Vector newLocation = location.add(compositeVector);
        final Vector oldLocation = coordinates.get(nodeToAlign);
        double moveDistance = Math.abs(newLocation.calculateRelativeTo
                (oldLocation).getDistance());

        if (moveDistance > maxMovement) {
            maxMovement = moveDistance;
        }

        coordinates.put(nodeToAlign, newLocation);
        return newLocation;
    }
    public static Vector randomCoordinates(final int dimensions) {
        final double[] randomCoordinates = new double[dimensions];
        for (int randomCoordinatesIndex = 0; randomCoordinatesIndex < 
                dimensions; randomCoordinatesIndex++) {
            randomCoordinates[randomCoordinatesIndex] = (RANDOM.nextDouble() 
                    * 2.0) - 1.0;
        }

        return new Vector(randomCoordinates);
    }

    private Vector processLocally() {
        Vector pointSum = new Vector(dimensions);
        for (final N node : graph.getNodes()) {
            final Vector newPoint = align(node);
            for (int dimensionIndex = 1; dimensionIndex <= dimensions; 
                 dimensionIndex++) {
                pointSum = pointSum.setCoordinate(pointSum.getCoordinate
                        (dimensionIndex) + newPoint.getCoordinate
                        (dimensionIndex), dimensionIndex);
            }
        }
        return pointSum;
    }
}

Latex Inspired Rendering of Algorithms in HTML

I recently updated the code for my blog (yes, this one) to render, in classic latex style, algorithms as pseudocode. This style has been so extensively used from the 80's and continues to be used as a standard format for rendering algorithms as pseudocode. Below is an example of what I'm talking about, its the Quicksort algorithm rendered using this style.

% This quicksort algorithm is extracted from Chapter 7, Introduction to Algorithms (3rd edition)
\begin{algorithm}
\caption{Quicksort}
\begin{algorithmic}
\PROCEDURE{Quicksort}{$A, p, r$}
  \IF{$p < r$}
    \STATE $q = $ \CALL{Partition}{$A, p, r$}
    \STATE \CALL{Quicksort}{$A, p, q - 1$}
    \STATE \CALL{Quicksort}{$A, q + 1, r$}
  \ENDIF
\ENDPROCEDURE
\PROCEDURE{Partition}{$A, p, r$}
  \STATE $x = A[r]$
  \STATE $i = p - 1$
  \FOR{$j = p$ \TO $r - 1$}
    \IF{$A[j] < x$}
      \STATE $i = i + 1$
      \STATE exchange
      $A[i]$ with     $A[j]$
    \ENDIF
    \STATE exchange $A[i]$ with $A[r]$
  \ENDFOR
\ENDPROCEDURE
\end{algorithmic}
\end{algorithm}

If you'd like to use it yourself then it isn't too hard. You can get most of the magic from a nice little java script library that you can find on github called pseudocode.js. The problem is that you will only get half way there using the javascript alone. The parts that are there work great, but it won't search your html for you and render anything, you have to write that yourself. For that if you put the following block of code at the end of your page's source, and make sure the project I linked earlier is already installed, then it will solve that problem for you.

  var blocks = document.getElementsByClassName("pseudocode");
  for(var blockId = 0; blockId < blocks.length; blockId++) {
    var block = blocks[blockId];

    var code = block.textContent;
    var options = {
      lineNumber: true
    };

    var outputEl = document.createElement('div');
    outputEl.className += " pseudocode-out";
    block.parentNode.insertBefore(outputEl, block.nextSibling);

    pseudocode.render(code, outputEl, options);
  }

  while( blocks[0]) {
    blocks[0].parentNode.removeChild(blocks[0]);
  }

Now all you have to do is add any element with a class of "pseudocode" and write the actual psueudocode inside the element, following the format specified in the pseudocode project. I prefer to define this as a "pre" element so if javascript is turned off it will still produce a human readable block. So other then that I didn't have to do anything too special to get it working.