A list of lists of PhD resources

On your way to become a PhD, you not only have to learn how to do research, you also have to learn how to communicate your ideas comprehensible in text and speech, how to build the tools you need and how to survive in the microcosmos of supervisors, colleagues and under grad students of your research lab. But you are not the first to go through all this and people have written extensive advice for every problem you might encounter.  And as they are so popular right now, I present you here my: 

List of lists of PhD resources for computer scientists.

List of lists of lists

List of lists of lists!?

The most condensed summary I have found on the website of my work group IUPR. It is a good starter and gives one an overview of all the things one has to keep in mind and pay attention to.

From the most condensed to the most comprehensive. This collection links to nearly 100 articles on Ph.D. dissertation/research, presentations, writing,  reviewing/refereeing,  being a faculty member, job hunting, learning English and more. The list is overwhelming.

Links to documents on giving talks and writing papers and proposals.


If you did not actually study computer science (like me) or your courses mainly covered logic and reducing NP-complete problems, this site can probably help you a lot. Software carpentry is about learning the skills to write reliable software and using the existing tools efficiently. The website offers tutorials on basic programming, version control, testing, using the shell, relational databases,  matrix programming, program designing, spreadsheets, data management, and software life-cycles.

[UPDATE] How could I forget:

Like the well known Stackoverflow.com Academia is “a collaboratively edited question and answer site for academics and those enrolled in higher education.” It is still in its beta phase, but growing everyday. I like the aspect, that it will be always more current and extensive than all the pages only maintained by individuals or single work groups. And if you can’t find the information you need, you can always ask for help.

So what do you think? Do you find these resources useful? Some of them are already quite old. Do you think they are obsolete? What are you tips? Which collections did I forget?

Collabgraph [Follow up]

Black Hole

When collabgraph divides by 0
from @nelas

I was stunned to see that more than 600 people had tried collabgraph in the past months. This is a huge number considering my expectations and its niche use case. So thank you for taking the time and leaving feedback!

Developing collabgraph and taking part in mendeley’s binary battle was a great experience. And without your support I wouldn’t have made it into the TOP10. I am especially grateful to all, who voted for me in the final vote!

And these black holes are created by papers with hundreds of authors, which is apparently quite common in the genome research society.

Use all the cores!

Use all the cores

Use all the cores with GPGPU and HPC

The last months I have been diving into GPGPU programming with (Py)CUDA. Everytime you are working with images you are faced with huge amounts of data, so more speed is always welcome. CUDA promises speedups up to 300x, but this comes with a price of having to implement more or less everything yourself on a low level. In times this can be very challenging, so to cheer me up in the hard times I made this picture based on a popular internet meme. I also think the GPGPU and HPC community is in need of some catchy visuals.  Read more about this and other memes at Know Your Memes, X all the Y.

Let me know, what you think and spread it!

Follow-up on Mendeley’s Binary Battle

Last week Mendeley announced the winners of their Binary Battle.  The first place goes to openSNP

With openSNP, you can share your personal genome from 23andMe or deCODEme to find the latest relevant research and let scientists discover new genetic associations. Werner Vogels, Amazon CTO and one of our star judges said this, “OpenSNP is cool. I have uploaded my genotype, and it is interesting to see it at work.”

Runner up is PaperCritic.

PaperCritic allows for post-publication peer review in an open environment. Rate papers, write critical reviews or read those from others.

The third place goes to rOpenSci,

which provides R-based tools to facilitate Open Science; including R packages for both Mendeley and PLoS.

I am more than content that my entry, Collabgraph, made it into the top 10. And I am thankful for everybody who voted for me the last month. Taking part in the Binary Battle was a valuable experience. Thank you again, Paddy, for hosting Collabgraph!


Vote for Collabgraph!

For the people who don’t follow me on Twitter:

Collabgraph made it into the Top10 of Mendeley’s Binary Battle!

I am really proud, especially considering the high quality of all the other contenders. But if you tried Collabgraph and found it useful or fun, I would be more than happy if you’d vote for me! The public votes and the decision of the (surprisingly high profile) judges will be aggregated to select the winner. The first price consists of 10.000$, 1,000 Amazon Web Services Credits and limitless fame! So please, if you liked it

Vote for Collabgraph!  😀

Can Stack Exchange save scientific peer review? [Update]

One of the few things everybody seems to agree on is that the scientific review process, especially for computer science, is broken. I wont go into details here as there are many sources on the net.

But personally I found Yann LeCun’s pamphlets for “A New Publishing Model in Computer Science” inspiring. He proposes an open, karma-based online repository which I will summarize as follows:

  • In this system authors post their papers as soon as they feel, that there finished. The publication is put under version control and is immediately citable.
  • “Reviewing Entities” (RE), individuals or groups like editorial boards, then choose papers they want to review or accept review requests from authors.
  • REs do not “own” papers exclusively, so RE can choose to review any paper at any time. Papers can be reviewed by multiple REs.
  • The reviews are published with the paper and are themselves citable documents like regular publications.
  • Reviews are furthermore rated by readers. Good reviews will generate “karma” points for the RE, to show the usefulness of their review.
  • Additionally RE’s “karma” will increase if they are the first to positively review a paper which is than later rated as high quality by other REs as well. As a result RE will have an incentive to be the first to review good papers.

I will not repeat LeCun’s explanations on how it works in detail and why it would be superior to the existing system. Instead I want to point out how very similar this approach is to the Stack Exchange (SE) QA websites. Stack Exchange is a network of over 70 Q&A websites, with stackoverflow.com, a Q&A site on programming, being the first and largest one. On Stack Exchange websites everyone can ask questions which can be answered by all the members of the community. Both questions and answers will be rated by the community, so users are incentivized to write useful answers to questions which are relevant to many other users in order to gain reputation.

Especially if you have used a SE website, it is hard to ignore the similarities. Even though the SE framework was build to solve a different problem, I can see it being adapted to act as a repository for LeCun’s publishing model. Publications would be questions and reviews would be answers. I can only make out following necessary changes.

  • There needs to support for groups (RE),
  • high level users should not be permitted to change other people’s posts anymore and
  • the ‘answer’ functionality has to be removed.

Everyone who follows the two founders of Stack Exchange, Jeff Atwood and Joel Spolsky, knows, how determine both are to remove all diversion of their vision for Stack Exchange, so it wouldn’t be possible to be officially part of the SE community. But there is also OSQA, the open source copy of SE. Using this service makes it seem possible to implement the necessary features.

So, what do you think? Can Stack Exchange save scientific peer review?


LeCun was so generous to comment on my article via e-mail. He confirmed that his views on the peer review process and his model haven’t changed and agrees that creating the technical infrastructure shouldn’t be too hard. He already received several offers from possible volunteers, but the project is still missing a highly competent developer(-team) to “own” the project.

Disclaimer: I am not the first one to bring Stack Exchange on the table, but I found the other approach far less concrete.

Paper: Rendering Synthetic Objects into Legacy Photographs

Inserting 3D objects into existing photographs


This fascinating video presents a new method to insert 3D objects into existing photographs. It is based on the research of Kevin Karsch, Varsha Hedau, David Forsyth and Derek Hoiem  (all University of Illinois at Urbana-Champaign). Their main contribution is the algorithm, which generates the light model for the scene. The algorithm needs only one photograph and a few manual markings by a novice user together with a ground truth data set to create a near real life insertion. The ground truth data set was generated with 200 images from 20 indoor scenes under varying lighting conditions.

The video is well done and I am surprised whats possible, but I like to see how much user input is really necessary and how well the algorithm and the ground truth perform with other images. What do you think?

More details can be found at Kevin Karsch’s website.

Computer Vision News

I created Computer Vision News (CVN), an aggregator for all the events and academic vacancies within the field of Computer Vision, ImageAnalysis, and Medical Image Analysis. You can also follow it on Twitter @compvisionnews!

At the moment I use following sources:

Please write me, if you have sources I should add. I am happy to extend CVN.
I prefer just to have the headlines in my Twitter timeline, where they don’t clutter my mail client or my feedreeder. But use it as you like! Yeah! for more choices!

Who is collaborating?

Collaboration graph of master thesis created with collabgraph

In my scarce spare time, I have written Collabgraph to visualize connections between authors of scientific publications.

This python script reads a (your) bibtex file and draws a graph in which all the nodes are authors and the edges represent that the two authors have collaborated (or at least wrote a paper together).

On the right is the graph created by from the references used in my diploma thesis.  You can immediately see what central role Eakins, Meier and Flickner played.
Collabgraph requires only the pygraphviz library, which can installed with “easy_install pygraphviz”.

You can find the sourcode and the example at bitbucket.org.

I am looking forward to your feedback!!!

How to create good and fast Matlab code

As most of the readers of this blog land on one of the pages with the Matlab applications, I thought I collect some of my resources I use to write Matlab code.

First start with the official Mathworks help on how to write good code

than we have this 33-page tutorial “Writing Fast MATLAB Code” (PDF)

followed by the Recorded Webinar: Handling Large Data Sets Efficiently in MATLAB

For asking questions, I enjoy the Stackoverflow community. Here are two examples of answers you get for generall Matlab questions.

So, I hope you find these links more helpful than overwhelming.

Please leave a comment if you have anything to add!