Tuesday, September 30, 2014

Functional bioinformatics - Nucleotides and GC

In my quest to learn Clojure I have decided to try spending a little time applying it to simple problems within my domain, which happens to be bioinformatics. There is a great site, rosalind that is a great resource to try tackling some of the well known types of bioinformatics problems. I will post my solutions in clojure with some explanation as to how I approached the problems.

The first is just a simple count of all nucleotides within a string s. This problem can be seen as simply loading a sequence from a file (and stripping out the newline), and then finding and reporting frequencies. I am doing everything within an instarepl tab in LightTable, so there will not be any main function.

Lets look at a first version of the counting code, and see if it is possible to clean it up after developing the correct functionality.


(defn displayFrequencies [s] {:pre (string? s)}
  (def f (into (sorted-map) (frequencies (clojure.string/upper-case s))))
  (println (vals f))
)

Here we first take a string called s, using pre conditions to check the type prior to execution. For this application we are expecting all of our sequences to be represented as a string, though a vector of character literals would be equally valid in reality. The frequencies function actually counts the occurrences of each character in the string. The output of frequencies is a normal map, where keys are not ordered. However, it is helpful to have the items in the hash-map in lexicographical order (partly because that is the expected form for the problem). Sorted-map is like a hash-map however the items are stored in order sorted by their keys.

While this works, there is an unnecessary binding to f to help with clarity, as well as a bunch of nested calls. This can be rewritten more clearly using the thread-last macro (->>) as below.


(defn displayFrequencies [s] {:pre (string? s)}
  (->> s
    clojure.string/upper-case
    frequencies
    (into (sorted-map))
    vals
    println
  )
)

We really want to be able to apply this, or another, function to the contents of a file containing a sequence. In order to facilitate this we will create a function that accepts a filename and a function to apply to the string in the file.


(defn processFile [filename func] {:pre [(string? filename) (clojure.test/function? func)]
  (->> filename
    clojure.java.io/file
    slurp
    clojure.string/trim
    func)
)

(processFile "sequence.txt" displayFrequencies)

This sets things up easily to check the GC content of the sequence once that function is available.

Wednesday, September 24, 2014

Local Pypi

Recently at work we realized the need to run a local instance of a pypi compatible repository for our python packaging. In the past I have used artifactory for java and groovy builds, but have not really made a repository for python. While in many ways the packaging system in python isn't as mature as in the java ecosystem, there is still enough there to build a useful system. To me this is a critical step in setting up a development workflow in a given language.

So why is a good package repository so important for a team? There are a lot of ways that sharing libraries and packages can take place, like cloning a repository or using git submodules. While this might be a good way to get started (at least you aren't sending zip files around), it is a pretty ineffective system. Python has its own packaging scheme. One of the big goals with packaging is to have a package or build that has been created through an automated process, and undergone testing. The biggest part of that is the automation of testing. We just need a place to put things following the test and package process.

An easy way to make this something that can be used as easily by developers as well as in a CI pipeline is to package the repo into a docker container. I use pypiserver, which is a fairly simple lightweight system that is deployable as a local pypi instance. Below is a docker file that will run this using the Twisted networking framework for the actual server.



FROM ubuntu:14.04

RUN mkdir /packages

VOLUME /packages



RUN apt-get update

RUN apt-get install -y python python-pip python-twisted apache2-utils

RUN pip install pypiserver passlib



RUN htpasswd -cb .htaccess uploadUser changeme



EXPOSE 8080

CMD ["pypi-server", "--fallback-url", "http//pypi.python.org/simple", "-P", ".htaccess", "--server", "twisted", "packages"]

The .htaccess file allows a user with the name uploadUser to publish to this repo using the password changeme. These should be changed. The example .pypirc file entry that you would need when building is below.



[local]

repository: http:127.0.0.1:8080

username: upload

password: changeme

Then run



python setup.py sdist upload -r local

Saturday, September 6, 2014

Unity 3D first experiences

This weekend I have started to learn Unity 3D. Picking unity as a thing to learn was nearly a necessity from my choice to purchase an Oculus rift DK2. While other things are supported, Unity seems to meet my need for other projects as well as a good way to learn game and immersive interaction design. So far I am glad that I finally took the plunge into Unity, even though I have toyed with the idea for a few years now. I think it is important to talk about what has changed in my perception and understanding, and why something like Unity is great.

I have wanted to get into game design and more complex visualizations for a while now, and I have had a number of fits and starts. The frameworks that I have looked at range from pygame, XNA, Play-CLJ (libgdx based), Panda3d, and also rolling my own in Java and C. While I would like to get back to a few of this, namely Play-clj and games in C for the XGameStation AVR system, I feel that the best chance of success at truly creating something is through Unity. The biggest factor so far has been the quality of the Unity tutorials. Making games is a complex task, where some of the truly interesting parts are not the programming. That is the biggest change in my thought process about writing games that has happened. The part that is most important isn't the how, but the what.

For a novice in games being able to experiment quickly with aspects outside of just writing code is critical. I feel as though Unity has the most seamless approach out of all of the things I have tried thus far. The languages that are immediately supported out of the box are not really interesting to me, they are familiar enough from other things I have done that I can focus on the interactions and environment.

So far I have spent time working with the roll-a-ball tutorial project as well as creating graphs using the built in particle system. I found this tutorial for creating visualizations, which is where using the particle system for graphs comes from. It is an interesting approach that to me was not immediately obvious.

Monday, June 2, 2014

Code practice and challenges

I am going to try to compile a list of code practice, challenge, and ongoing contest sites that are helpful for learning and refining programming knowledge. It seems like there have been a bunch of these popping up lately, and having a list of new and old might be helpful.

hacker rank focusing on algorithms and AI
code game
Project Euler focuses on math
Rosalind is similar to project Euler except with a focus on bioinformatics.
Code chef focuses on training for programming contests and short challenges

Wednesday, May 28, 2014

Learning clojure

Background

In the past I have toyed around with clojure a little, but never really put in significant time to learn the language even though it has greatly appealed to me. Now I have started to really dive into it and I feel like sharing some of the resources I am using, as well as some getting started info. I will not write a clojure tutorial here, as I feel I would only be rehashing what others have done. Instead I will lay out the path I am following to gain better understanding of the language and its power for those who may wish to do the same.

My first real exposure to clojure came in the form of reading Seven Languages in Seven Weeks (Which I would very highly recommend to any non beginner programmer). While this book is wonderful in exposing people to different ways of thinking about programming, and getting a very real glimpse into different languages, it is an insufficient introduction to any of them. It is however a good starting point for what follows.

Setup

First I recommend installing Leiningen and start a REPL using lein repl in the command line. Here I would recommend exploring a basic tutorial or the material from 7 Languages, especially if you are new to Lisp in general.

Following some light exploring it is prudent to setup an IDE to work in, and here I actually turned to an interesting project called Light Table. I feel this complements my style very well, but another good alternative is Emacs for the Lisp die hards out there.

Learning through tests

I really like the Koans projects since they teach through TDD. While you are not learning about TDD in these projects, you it is helping facilitate your learning. In many ways the habit of making assertions pass in the koans sets up the habit of making them pass in a testing framework like Expectations.

So I recommend working through:

http://clojurekoans.com/
https://github.com/sritchie/core.logic-koans
http://clojurescriptkoans.com/

In addition, after finishing clojure koans I think working through the problem sets in 4clojure will solidify understanding and skill with the language. At this point projects are in order; my standard projects when learning a language revolve around artificial life simulations such as John Conway's game of life, and boids simulations. I recommend following your standard set of exercises, or if you haven't developed any yet start to put together some cool projects that are not too involved, but are still a full piece of software.

I will finish by saying the next step for the brave will be to dive into "The Joy of Clojure". While the basics will be covered again, the book moves at a rapid pace, and will fill in many gaps in function programming skills if you are unfamiliar with fp.

Wednesday, May 21, 2014

What is programming?

I am currently reading this article and one of the first questions that is put forth is 'what is programming'. I want to express here some of my thoughts before continuing to read, and see what some of you might think as well.

Over the years my thought of what programming is has drastically changed, from a mechanical exercise in solving puzzles to what I hope is a more mature definition. In essence I think that programming is the art of unambiguously expressing a set of problems along with a subset of the solution space to them. Our solutions are generally encoded within a programming language so that they can be exercised against real world instances of the problem.

As a beginning programmer I often missed the first part (and am still guilty of it sometimes), and I believe this is partly because of how programming is initially taught. We are often given well defined problems to solve, and this continues for a while while we learn the mechanics of writing code, however it might be the most critical part of development. If you aren't solving a specific well defined problem then the code is going to be all over the place and not very good. Maybe we need to find more ways to work this into the early stages of educating developers?

Code and awesome