Git vs. Subversion – Which to Use for Your Next Project

I recently did some research to support using Git or Subversion for a new project, and decided to include that in my blog (with permission).  While I don’t formally give attributions, any items in quotes came from other resources on the web.

 

Git Advantages

  1. Just as Subversion is the next evolution of open source code control from CVS, Git is the next open source code control evolutionary step
  2. Git offers distributed and federated branching as opposed to Subversion’s limitation of a single server with multiple clients.  A Git client can check out from a remote Git repo.  The user can make changes in units of work, and then commit those changes locally.  They can repeat the cycle of local units of work locally committed.  When ready, they can then decide to push their changes back to the remote repo, where everyone else can later pull them.  This allows the local developer the safety of make small changes as a unit and committing, as part of a much larger change, and then submitting that larger change as a single unit when ready (which at the same time constantly be pulling the latest changes from the remote depot)
  3. In addition, other users can be set up to check out from the user’s local depot, resulting in a federated model.  There is no strong convention as to which depot is the official master, except by convention and agreement.  Git uses a peer to peer model, which Subversion is client server.  This becomes even more important should the official repo be lost for some reason
  4. Because it is a distributed model, the workflow is established by the developer, not by the centralized repository owner.  Git does not depend on a centralized server, but does have the ability to syncronize with other Git repositories – to push and pull changes between them. This means that you can add multiple remote repositories to your project, some read-only and some possibly with write access as well, meaning you can have nearly any type of workflow you can think of”
  5. Due to being distributed, you inherently do not have to give commit access to other people in order for them to use the versioning features. Instead, you decide when to merge what from whom.  That is, because subversion controls access, in order for daily checkins to be allowed – for example – the user requires commit access. In git, users are able to have version control of their own work while the source is controlled by the repo owner.”
  6. Creating a new branch in Git in much quicker (i.e., 5 seconds), easier, and less centralized than in Subversion.  A developer can make that decision locally without having to consult or impact anyone else.  Until all parties agree, that branch can remain invisible to the rest of the team.   This allows more experimentation, parallel development, and rollback of failed prototypes or Scrum spiking with no impact to the rest of the team.  New branches require 41Kb, and deleting a branch means just deleting a single file (though there are commands to do it).  “Creating a repository is a trivial operation: mkdir foo; cd foo; git init That’s it”
  7. Branches and labels are not just copies that can be altered, but are true first class citizens in Git.  Audit trail and real TAGS, as opposed to using BRANCHES to simulate tags.  In GIT at each point in history, a SHA key is generated that identifies the stated of the code. It is easy to track the history if someone tries to tamper with the code or mistakenly deploys the wrong code into production environments. Git has a very strong audit trail
  8. Integrating branches and merging is far easier and less conflict ridden (with less chances of accidents or problems) in Git.  Git has very strong merge algorithms.  Developers can do full merges locally before having to push the merge back into the main branch
  9. “Branch merging is simpler and more automatic in Git. In Subversion you need to remember what was the last revision you merged from so you can generate the correct merge command. Git does this automatically, and always does it right. Which means there’s less chance of making a mistake when merging two branches together”
  1. “Branch merges are recorded as part of the proper history of the repository. If I merge two branches together, or if I merge a branch back into the trunk it came from, that merge operation is recorded as part of the repostory history as having been performed by me, and when. It’s hard to dispute who performed the merge when it’s right there in the log”
  2. “If you have partial merges for a work in progress, you will take advantage of the Git staging area (index) to commit only what you need [break it up and check in what you want now], stash the rest, and move on on another branch.”  By stash, he means if you are working on a project, and a bug comes from production, you can stash your current work as a built-in function of Git, seamlessly switch to the production branch, make the code change, check it in, and then unstash your work and continue working just as you were before
  3. When you check out with Git, you get the entirety of the repo, not just the one branch.  You get the full history, branches, merges, versions, and everything in your local version.  This is what allows you to fully work remotely without having to have a network connection.  In addition, each new branch created carries forward the pre-branch history
  4. It is faster.  Since all operations (except for push and fetch) are local there is no network latency involved to a) perform a diff, b) view file history, c) commit changes, d) merge branches, e) obtain any other revision of a file (not just the prior committed revision), or f) switch branches”
  5. Git stores its information in a more compressed manner than Subversion, which reduces the size effects of the previously noted advantage.  “Git’s file format is very good at compressing data, despite it’s a very simple format. The Mozilla project’s CVS repository is about 3 GB; it’s about 12 GB in Subversion’s fsfs format. In Git it’s around 300 MB”
  6. “The repository’s internal file formats are incredible simple. This means repair is very easy to do, but even better because it’s so simple its very hard to get corrupted. I don’t think anyone has ever had a Git repository get corrupted. I’ve seen Subversion with fsfs corrupt itself. And I’ve seen Berkley DB corrupt itself too many times to trust my code to the bdb backend of Subversion
  7. Git does not require little .svn folders in each of the subdirectories as SVN does, which can cause minor problems sometimes.  All the git information is stored in a .git folder at the top level of the depot.  In SVN, I’ve dealt with developers from novice to experts, and the novices and intermediates seem to introduce File conflicts if they copy one folder from another SVN project in order to re-use it. Whereas, I think in Git, you just copy the folder and it works, because Git doesn’t introduce .git folders in all its subfolders (as SVN does).
  8. SVN is the third implementation of a revision controlRCS, then CVS and finally SVN manage directories of versioned data. SVN offers VCS features (labeling and merging), but its tag is just a directory copy (like a branch, except you are not “supposed” to touch anything in a tag directory), and its merge is still complicated, currently based on meta-data added to remember what has already been merged.  Git is a file content management (a tool made to merge files), evolved into a true Version Control System, based on a DAG (Directed Acyclic Graph) of commits, where branches are part of the history of datas (and not a data itself), and where tags are a true meta-data.”  In other words, having started as a tool to merge files into a true VCS is what makes Git so much more powerful than Subversion
  9. You have to go with a DVCS, it is like a quantum leap in source management

 

Subversion Advantages

 The following are the advantages of Subversion:

  1. You can check out part of a branch instead of the entire thing having to be checked out
  2. Subversion is stronger in storing and managing very large binary files.  SVN is the only VCS (distributed or not) that doesn’t choke on my TrueCrypt files (please correct me if there’s another VCS that handles 500MB+ files effectively). This is because diff comparisons are streamed (this is a very essential point). Rsync is unacceptable because it’s not 2-way.”
  3. There were earlier problems with using Git on Windows back in 2008 due to lack of support, but that has been addressed at this point
  4. If your development is linear and simpler (without requiring branches and parallel work), you should stick with Subversion
  5. Because Subversion has been around longer, it may have better tool support.  This was more a problem around 5 years ago, but Git has mainstream tool adoption at this point
  6. Most people already know how to use Subversion instead of Git.  To use Git, some internal training (which I can do) will be involved to not only use Git, but to use Git as it was intended (and not to use as if one were using Subversion)
  7. Walking through versions is simpler in Subversion because it uses sequential revision numbers (1,2,3,..); Git uses unpredictable SHA-1 hashes. Walking backwards in Git is easy using the “^” syntax, but there is no easy way to walk forward”

More Effective Studying

I spend a lot of my free time studying computer science.  For about a year and half, I spent about 20 hours/week.  Now I am down to only about 10 or 12.

Historically, I typically studied only one item at a time, whereas I had a friend who would do two or three.  I believe his approach to be better and have adopted it.

When you study a single technical book, all items are not equal.  The earlier items provide a required base for the later items; you have to fully understand and recall them in addition to the later material you are studying.  What helps with that fuller comprehension?  Removal and time.

If I read part A of a book and wait until the next day to read part B, my subconscious mind has a chance to process that material and master it.  Thus, coming back the next day makes reading and comprehending part B all that much easier.  It the same science that allows me to suddenly make connections on problems I have been working on when taking a shower.

Sometimes you may be aware of that processing.  For example, some nights I may recall having intense problem solving dreams around something I am studying as my brain forms new neural networks.  This also works for new video games.

Now, I am studying about five different subjects simultaneously.  Not only is the process less taxing and stressful, but I am mastering it more effectively.

Information Architecture – Part 2

I read about half of the “Information Architecture for the World Wide Web”, and then stopped at that.   Not because it is not a good book; I just don’t plan to become a professional Information Architect.  If I need to go deeper at some point, I’ll read the rest.  All in all, I definitely recommend the book.

Nonetheless, it is really interesting to think about information architecture, organization, structure and search as abstract concept independent of an actual application, as well as applying to the real world of grocery stores, department stores, libraries, etc.

A couple of things really stuck out for me in addition to what I learned in “Don’t Make Me Think”:

There are two main ways to organize based upon the needs of your users:

  • Exact organizational scheme. When the user knows exactly what they are looking for (e.g., white pages)
  • Ambiguous organizational scheme. When users don’t know what they are looking for.  Some types include organizing by task or topic, among other things

Hierarchical organizational structures can be tricky as many things do not neatly fit into a strict taxonomy.  A taxonomy that allows cross-listing is referred to as polyhierarchical.  However, if too much of this takes place, the value of the organization is reduced.

In addition, there is tension between the breadth and depth of the hierarchies.  It is generally better to go for greater breadth, particularly as your site grows.

My knowledge grew the most in the area of search.  I have a friend whose site needs search capability.  I had thought of plugging something in for him for the entire site, and until did not realize the complexities involved in matching the users needs to the search capabilities.

For one, your users may need recall or precision, but they can’t have simultaneously as these are mutually exclusive:

  • Recall. Recall is oriented toward finding a greater number of relevant matches (e.g., doing due dilligence on a company you are considering joining)
  • Precision. Precision is oriented toward finding just a few very high quality matches (e.g., instructions on deck staining)

You will need to configure or choose your search engine accordingly.  In addition, choosing to take advantage of automatic stemming (e.g., using thesauri) in your searching will result in greater recall at the expense of the precision.  Your users’ goals needs to play an important part in this configuration.

In addition, you can choose to index your entire site, or break it up different search zones.  Once again, the former increases recall, while the latter increases precision.

Finally, there are numerous ways and levels of details to provide around search results.  Once again, this will be based upon your users goals.

In the end, you need to really understand your users’ goals to be able to create the appropriate Information Architecture.

User Stories Applied: For Agile Software Development by Mike Cohn

So, I have been busy the last number of weeks rolling out Scrum along with complementary Agile practices to a new team.  I saw this book, thumbed through it, and felt it would help me address some of the challenges I saw down the road.  I put a brown bag together that morning, and then off we went on planning our new release!

I (as Scrum Master) had already worked with the PM (e.g., Product Owner) to prioritize the features. We did not have them in user story format yet, but that was okay as we had a set of one to two sentence feature descriptions.

How could that possibly be okay?  Well, the real purpose of user stories is not so much to fully document the desired functionality but to serve as placeholders to drive face to face discussions between developers and PM.  From the Agile Manifesto:  “Individuals and Interactions over processes and tools”.  With only a few sentences, you have to talk.  You can add a few notes and assumptions, but I like the general advice about how if your user story notecard is not large enough to hold all the information, try a smaller notecard.

In other companies I had worked unsuccessfully with PM’s trying to leverage Use Cases as the new solution.  They were taking too long to write, the PM’s did not have time to do it, and often the PM’s would write use cases based upon the UI design they had in mind for a feature as opposed to addressing the harder task of writing it based upon users’ needs.  User stories force you to address the latter.

Make no mistake, designing features based upon the users’ real needs is challenging, hard, and painful.  The work we spent the last 1.5 days was much harder than the coding I expect to be doing in support of these features – I would go home exhausted!

First, we gathered the developers, QA and Doc together.  PM and UX served as the customer representatives in the Estimating Game.  We have our upcoming release, and needed to get a rough estimate for each feature for initial release planning and re-prioritization based upon effort.

The PM would grab the next highest priority user story (feature) and present it.  Then, all of us would ask a number of questions about the feature.  After that, developers would write their estimates down, and then they would reveal all at once.  The estimates would often differ wildly, so the developers would discuss their rationale.  We would repeat a couple times and come to consensus.  QA and Doc would give their estimates, and then it was tracked.

The meeting went well at first, but then started to bog down as the PM had to leave early and we were missing the business value.  So, for our next meeting, I asked the PM to convert each feature to an epic user story on the fly.  It was epic because it was too large in term of what would need to be done, and required being broken down into multiple user stories.

We would then raise a number of questions, continue to take the initial user story and break it down into other user stories based upon user end-to-end functionality.  We all worked extra hard to keep UI out of the stories, and to use implementation only as examples to flesh out desired behavior.  What was particularly helpful was using this format from the book:

I (as role) want to be able to do (feature) because (business value)

Since our PM is busy, I decided to use the meetings with him to focus on user stories, and then have separate meetings with the implementers to play the estimating game (and later sync back together).

Now that we had this, it was much easier for the developers to work with UX on how to approach the feature, and how to design it at a high-level for estimation in separate sessions (we’ll pull the PM back in to these as well as his time frees up).

For one user story, a developer pointed out that the user story presented a solution for solving a problem rather than stating the problem that needed to be solved.  It was subtle, but by being able to back the user story up to the more abstract representation of the problem to be solved, we can design a much better feature.

In general, it takes longer for the development team to complete the estimating game for the user stories associated with each feature than it does to work with PM to break them up.

Initially, I had hoped to approach the features planned for the release in a Just in Time (JIT) approach on each Sprint, but given dependencies between the features and user stories, it was important to spend a few days up front establishing the user stories associated with the release, even if all may not end up making it.

It was painful, but necessary and effective.  Sitting down together and talking through these features was highly effective.  We have a much better understanding of how to design the features in a way that will best meet the end user’s needs.  The Developers picked up a much deeper domain knowledge of the problems our customers are facing, as well as an increased respect for the PM (as did I).  I think the PM (by participating in the initial estimating game) had a better understanding of why the estimates were as high as they were.  All in all, we are much better positioned for success.

Information Architecture – Part 1

Taking a break from development and Agile activities, I have started reading “Information Architecture for the World Wide Web” by Peter Morville & Louis Rosenfeld.  This is part of my attempt to increase the user-oriented side of my skills in addition to more technical web development skills.

The book begins by using an analogy of physical building architecture, which builds nicely on the mental model I started to build up from Steve Krug’s “Don’t Make Me Think”.  Different building architectural styles serve different user purposes, labeling and classification enable users to navigate effectively, and the importance of search.  While there are also similarities with physical libraries, the multi-dimensionalities associated with the web present a different set of problems.

What I found interesting is their discussion on Information Needs and Information Seeking Behaviors.  They discussed four types of Information Needs:

  • The Perfect Catch. You know exactly what you are looking for – someone’s telephone number, a fact about the population of the state of Louisiana, etc.  Basically, you are looking for “the right answer”
  • Information Exploration. You might looking for the best apartment swapping services in Paris, or different investment options in your online 401K service (as I was recently).  There are multiple good matches.
  • Exhaustive Research. You might be doing research for your thesis, or conducting medical research about a disease a friend may have acquired.  You want to leave no stone unturned.
  • Refinding. This is where services like del.icio.us come in handy, or the “Favorites”  link in YouTube

The point is, how you design search capability and organize your site is going to differ vastly for these purposes, and an understanding of how your users will want to use your site will play a major role in your information architecture.  How you organize search, links, content and navigation will either enable or befuddle your users in their goals.  In other words, you want to set things up in such a way that your users do not need to think.

Another thing I found interesting (I am only through Part 1) is the Berry Picking Model by Dr. Marcia Bates of USC:

  1. Start with an information need
  2. Formulate an information request (query)
  3. Move through the site(s) in different ways
  4. Pick up important bits of information (berries)
  5. Refine your query based upon what you already found and repeat

This stuck out because I was just doing this this morning before reading this:

  1. Searching for help on using a particular technique
  2. Finding some helpful articles (and either adding to del.icio.us or Evernote) while browsing then rejecting unhelpful ones
  3. Altering the search query in the hopes that it would offer more links that better fit my need

Finally, even though the product I work on a web-based application as opposed to a site, an understanding of Information Architecture can also be helpful in terms of how we present information and work with user requests.

Using WATIR for Browser Based Testing

I had wanted to learn to use Selenium to automate browser-based testing, but a QA person at work gave a brown bag on WATIR.  Given the number of browsers I could use it with, I decided to play around with it.

The WATIR site has a page on installation which is pretty straight-forward (I didn’t bother with supporting Safari on my Mac, and the Windows install was pretty straight forward as well).  Installing the plugins for FireFox referenced on the install page was also straightforward.

You do need to start Firefox up initially using the -jssh option (for IE on Windows, it comes up automatically).  Here is how I did it for Mac:

cd into /applications/Firefox.app/Contents/MacOS
./firefox-bin -jssh

For Windows 7:

cd into Program Files (x86)\Mozilla Firefox
firefox.exe -jssh

I brought up Ruby’s IRB to start playing around.  On my Mac, I could execute commands in the IRB, but in my ruby script file, it was failing on the following:

require ‘watir’

For the scripts, I needed to add require ‘rubygems’ first.  On Windows, I needed to do this for both IRB and ruby scripts:

require ‘rubygems’
require ‘watir’

To bring Firefox, I did the following (you can just comment out the first line to bring up IE):

Watir::Browser.default = “firefox”
b = Watir::Browser.start “http://mysite.com”

The browser popped up to this site.  I needed to login, so I specified the name and password by finding the element by id and then specifying the text to be typed in:

b.text_field(:id, “user”).set(“name_of_user”)
b.text_field(:id, “password”).set(“the_password”)

In the browser, it was almost as if an invisible person typed in the text.  Next, I needed to the “Sign In” button, but there was no id associated with it, so I had to click it after finding it by its value:

b.button(:value, ‘Sign In).click

Now the home page came up.  I wanted to create a new Foo, so I needed to get to the Foo page, which is referenced by a link and is called “New Foo” on the page:

b.link(:text, ‘New Foo’).click

I was now on the new page, and starting following the steps to fill out the fields to create the new Foo.  However, I was getting the following error:

 C:/Ruby/lib/ruby/gems/1.8/gems/watir-1.6.5/lib/watir/element.rb:56:in `assert_ex ists': Unable to locate element, using :id, "username" (Watir::Exception::Unknow nObjectException) from C:/Ruby/lib/ruby/gems/1.8/gems/watir-1.6.5/lib/watir/element.rb:288 :in `enabled?' from C:/Ruby/lib/ruby/gems/1.8/gems/watir-1.6.5/lib/watir/element.rb:60: in `assert_enabled' from C:/Ruby/lib/ruby/gems/1.8/gems/watir-1.6.5/lib/watir/input_elements .rb:327:in `set' from watir_fun.rb:6 

I double checked the id’s; all looked well.  It turned out that that page was doing some additional javascript after having been loaded, so these fields were not ready for me to access.  By adding ‘sleep 2′ to the script prior, the page had time to load; and I was able to follow the steps to create the entity.

But was the entity created successfully?  Because none of the displayed fields had a unique id (and this was only a quick experiment), I simply checked with something like the following:

b.text.include?(“Hot Stocks”)

Obviously, for production readiness this is not acceptable, so we would likely had keys or some easier way to access via XPATH and so forth.   The point is, WATIR gives you an easy way to automate interacting with a browser and seeing what the results are.

In the end, I found picking up WATIR to be quite straightforward.  Back when I was at my current company previously, I used to go through a short manual test script before checking it to make sure my changes didn’t break anything.  This weekend (for fun), I hope to code up this former script in WATIR in just a few hours and start having us use it in Development next week.

Additionally, the WATIR web-site is helpful, well-organized, and I hear the help and mailing lists are quite responsive and friendly.

Choosing Clojure Over Scala

I saw Venkat Subramanian given a presentation on Scala at one of the NOVAJUG meetings. Venket is an excellent presenter and a very smart guy.

Two weeks or so ago Stuart Halloway gave a presentation on Clojure at another NOVAJUG meeting held at Oracle. Stuart is one of those guys who makes you marvel how fast his brain works. Unfortunately, his one hour presentation probably could have used 90 minutes to two hours to fully absorb. I went by the book store the next night, and all copies of his Clojure book were gone.

For the last five months or so, I have been trying to figure out whether to study Clojure or Scala. Given the fact that I am in chapter 2 of Stuart’s Clojure (as well as the subtle title of this post), I have decided to go with Clojure.

Scala looked very promising, but Clojure seemed far more different than Java. I view that as a good thing. Getting experience in a variety of language styles as opposed to those that seem closer to Java can only be a good thing, No, I am not saying the Scala is just like Java.

Clojure does have a lot of common with LISP, which I enjoyed programming in in college. But it appears to have made some improvements over standard LISP.

Its functional and transactional approach to concurrency (as opposed to manual locking) seems interesting.

Finally, as Ruby is a more expressive language than Java, Clojure appears to be even more expressive than Ruby. Clojure may be very different than Java, but it is easier to program in (as claimed by Stuart).

Will Clojure be the next big language? Possibly. Will I be a much better developer by learning a language such as Clojure? Definitely.