Tuesday, September 30, 2014

Functional bioinformatics - Nucleotides and GC

In my quest to learn Clojure I have decided to try spending a little time applying it to simple problems within my domain, which happens to be bioinformatics. There is a great site, rosalind that is a great resource to try tackling some of the well known types of bioinformatics problems. I will post my solutions in clojure with some explanation as to how I approached the problems.

The first is just a simple count of all nucleotides within a string s. This problem can be seen as simply loading a sequence from a file (and stripping out the newline), and then finding and reporting frequencies. I am doing everything within an instarepl tab in LightTable, so there will not be any main function.

Lets look at a first version of the counting code, and see if it is possible to clean it up after developing the correct functionality.

(defn displayFrequencies [s] {:pre (string? s)}
  (def f (into (sorted-map) (frequencies (clojure.string/upper-case s))))
  (println (vals f))
)


Here we first take a string called s, using pre conditions to check the type prior to execution. For this application we are expecting all of our sequences to be represented as a string, though a vector of character literals would be equally valid in reality. The frequencies function actually counts the occurrences of each character in the string. The output of frequencies is a normal map, where keys are not ordered. However, it is helpful to have the items in the hash-map in lexicographical order (partly because that is the expected form for the problem). Sorted-map is like a hash-map however the items are stored in order sorted by their keys.

While this works, there is an unnecessary binding to f to help with clarity, as well as a bunch of nested calls. This can be rewritten more clearly using the thread-last macro (->>) as below.


(defn displayFrequencies [s] {:pre (string? s)}
  (->> s
    clojure.string/upper-case
    frequencies
    (into (sorted-map))
    vals
    println
  )
)


We really want to be able to apply this, or another, function to the contents of a file containing a sequence. In order to facilitate this we will create a function that accepts a filename and a function to apply to the string in the file.


(defn processFile [filename func] {:pre [(string? filename) (clojure.test/function? func)]
  (->> filename
    clojure.java.io/file
    slurp
    clojure.string/trim
    func)
)

(processFile "sequence.txt" displayFrequencies)


This sets things up easily to check the GC content of the sequence once that function is available.

Wednesday, September 24, 2014

Local Pypi

Recently at work we realized the need to run a local instance of a pypi compatible repository for our python packaging. In the past I have used artifactory for java and groovy builds, but have not really made a repository for python. While in many ways the packaging system in python isn't as mature as in the java ecosystem, there is still enough there to build a useful system. To me this is a critical step in setting up a development workflow in a given language.

So why is a good package repository so important for a team? There are a lot of ways that sharing libraries and packages can take place, like cloning a repository or using git submodules. While this might be a good way to get started (at least you aren't sending zip files around), it is a pretty ineffective system. Python has its own packaging scheme. One of the big goals with packaging is to have a package or build that has been created through an automated process, and undergone testing. The biggest part of that is the automation of testing. We just need a place to put things following the test and package process.

An easy way to make this something that can be used as easily by developers as well as in a CI pipeline is to package the repo into a docker container. I use pypiserver, which is a fairly simple lightweight system that is deployable as a local pypi instance. Below is a docker file that will run this using the Twisted networking framework for the actual server.


FROM ubuntu:14.04
RUN mkdir /packages
VOLUME /packages

RUN apt-get update
RUN apt-get install -y python python-pip python-twisted apache2-utils
RUN pip install pypiserver passlib

RUN htpasswd -cb .htaccess uploadUser changeme

EXPOSE 8080
CMD ["pypi-server", "--fallback-url", "http//pypi.python.org/simple", "-P", ".htaccess", "--server", "twisted", "packages"]


The .htaccess file allows a user with the name uploadUser to publish to this repo using the password changeme. These should be changed. The example .pypirc file entry that you would need when building is below.


[local]
repository: http:127.0.0.1:8080
username: upload
password: changeme


Then run

python setup.py sdist upload -r local

Saturday, September 6, 2014

Unity 3D first experiences

This weekend I have started to learn Unity 3D. Picking unity as a thing to learn was nearly a necessity from my choice to purchase an Oculus rift DK2. While other things are supported, Unity seems to meet my need for other projects as well as a good way to learn game and immersive interaction design. So far I am glad that I finally took the plunge into Unity, even though I have toyed with the idea for a few years now. I think it is important to talk about what has changed in my perception and understanding, and why something like Unity is great.

I have wanted to get into game design and more complex visualizations for a while now, and I have had a number of fits and starts. The frameworks that I have looked at range from pygame, XNA, Play-CLJ (libgdx based), Panda3d, and also rolling my own in Java and C. While I would like to get back to a few of this, namely Play-clj and games in C for the XGameStation AVR system, I feel that the best chance of success at truly creating something is through Unity. The biggest factor so far has been the quality of the Unity tutorials. Making games is a complex task, where some of the truly interesting parts are not the programming. That is the biggest change in my thought process about writing games that has happened. The part that is most important isn't the how, but the what.

For a novice in games being able to experiment quickly with aspects outside of just writing code is critical. I feel as though Unity has the most seamless approach out of all of the things I have tried thus far. The languages that are immediately supported out of the box are not really interesting to me, they are familiar enough from other things I have done that I can focus on the interactions and environment.

So far I have spent time working with the roll-a-ball tutorial project as well as creating graphs using the built in particle system. I found this tutorial for creating visualizations, which is where using the particle system for graphs comes from. It is an interesting approach that to me was not immediately obvious.