Even with ever growing interest in deep learning I still find myself using the bag of visual word approach, if only to have a familiar baseline to test my new fancy algorithms against. I especially like the BoW demo script from the VLFeat team, that reaches a solid 65% accuracy on the, admittedly outdated, Caltech101 dataset. The script has the advantage that it is contains all the usual steps in one script (feature extraction, training of the classifier and evaluation of the whole pipeline) and that it can also be easily adapted to other datasets.
The only problem was, that it is a Matlab script and Matlab licences are in my experience often scarce due to their high price even for research institutes. So I rewrote the script in Python using the uncomplete VLFeat Python wrapper.
You can find my code as usual on github: https://github.com/shackenberg/phow_caltech101.py
In case you are just diving into the world of BoW I recommend my minimal BoW image classifier code, which might be easier to understand.