Testing Django application with large dataset
Posted by skruk in Development on 2012/01/16
Here is the thing: you are developing an application in Django and you want to test the REST API you have delivered. Simple matter if that API does nothing else but creating, updating, retrieving and removing elements of given type; you just need to create some dummy data, store it as JSON and load as fixtures.
How about when part of your REST API is used mainly for retrieving information but … in order to test it you cannot simply rely on dummy data, or at least not a simple one. Where to prove that your information retrieval solution can really work on large set of interconnected resources ?
You can still try to python manage.py dumpdata your production database and come up with JSON weighting gigabytes. Then you can hope that the it will load painlessly as fixtures (which might not usually be the case). And last but not least that the fixtures loading process will not take forever and 2 days.
The obvious thing would be to simply SQL dump your data and load it into the test_ database. But when and how ?
Here is a couple of simple steps:
- SQL dump your data as CSV files – one for each table (
$table_name.csv) - Create a file (
load_data.lst) with list of tables to be loaded - Put both the CSV files and the manifest (load_data.lst file) to
sqlfolder in your Django app, next to yourtests.py - Extend your usual implementation of the TestCase with own
_fixture_setupmethod: - And here is the definition of the
load_dbfunction: - UPDATE: once you start writing tests on this large dataset you need to make sure you know what kind of results to expect and you cannot update it since than. Otherwise your test are very likely to break.
Simple as it seems – that’s all – your done. Enjoy!
Photo strips made easy
Posted by skruk in Photography on 2011/05/08
Here is the thing: imagine you have been taking a whole bunch of photos of quite similar scene. After reviewing all of them you made a short list of, say, four the best. You wish you could narrow this list down to only ONE photo.
And, here comes a thought: let’s do a photo strip.
Of course you could start Photoshop, Gimp, or any other photo editing program you like, and start stitching them all together, adding border, etc.
What if these photos where taken with your state of the art 12M photo camera ? And you computer/laptop is quite far from being called state of the art ?
I have tried doing it once with my MacBookPro (2007). Almost burned this poor machine
Gladly I am not afraid of bash scripting and tools like ImageMagick.
Here is a simple script that does the stitching for us. My MBP managed to do the hard work in just few seconds.
#!/bin/bash
FILES=$1
DIR=$2
WIDTH=$3
HEIGHT=$4
STEP=$5
OUTPUT=$6
if [ "$FILES" = "" ];
then
echo "Need 1st parameter: FILES [\"*.tif\"]"
exit 1
fi
if [ "$DIR" = "" ];
then
echo "Need 2nd parameter: DIR [1]"
exit 1
fi
if [ "$WIDTH" = "" ];
then
echo "Need 3rd parameter: WIDTH [500]"
exit 1
fi
if [ "$HEIGHT" = "" ];
then
echo "Need 4th parameter: HEIGHT [500]"
exit 1
fi
if [ "$STEP" = "" ];
then
echo “Need 5th parameter: STEP [0.04]”
exit 1
fi
if [ "$OUTPUT" = "" ];
then
echo “Need 6th parameter: OUTPUT file”
exit 1
fi
if [ "$DIR" = "w" ];
then
DX=1
DY=0
BORDER=`echo “$STEP*$WIDTH” | bc`
else
DX=0
DY=1
BORDER=`echo “$STEP*$HEIGHT” | bc`
fi
ls $FILES | \
awk -v DX=$DX -v DY=$DY -v W=$WIDTH -v H=$HEIGHT -v BORDER=$BORDER -v STEP=$STEP -v OUTPUT="$OUTPUT" \
'BEGIN { print "convert \\"; X=0; Y=0; } \
{ print "\\(",$1,"-resize "W"x"H"^ -repage +"X"+"Y" \\) \\"; X=X+(1+STEP)*(DX*W); Y=Y+(1+STEP)*(DY*H) } \
END { print "-background black -fill black -compose Over -mosaic -bordercolor black -border "BORDER,OUTPUT; }' |\
sh -
To execute the script simply call:
./script.sh "Photo-?.tif" w 500 500 0.04 Photos-strip.tif
where parameters are:
- Photo-?.tif – tells to process all Photo-1.tif, Photo-2.tif, … in the current folder
- Should we create horizontal (‘h’) or vertical (‘w’) strip
- Width of each photo in the strip
- Height of each photo in the strip
- Fraction of width/height for the outside and inside borders
- Name of the output strip file
And here is the output:
Enjoy!
Happy Easter 2010
Posted by skruk in Life, Semantic Digital Libraries on 2010/04/03
Guess this is a pretty special Easter holiday this year; at least for me …
Two months ago I’ve published as a book my PhD Thesis on usability of information discovery in Semantic Digital Libraries; few days ago I received my copy of the hardcover version.
This is also the iPad weekend, and my book is also published in the ePUB format, which makes it iPad-ready.
There also other things on the horizon, of which I will let you know pretty soon…
Therefore, I would like to wish you all a very Happy Easter.
(Psst, there is 25% discount on my recent book as an Easter gift)
New book on Semantic Digital Libraries
Posted by skruk in Announcement, Semantic Digital Libraries, thesis on 2010/02/12
Probably you heard by now, but if you didn’t – here is the big news: I have published a second book on Semantic Digital Libraries: Improving Usability of Information Discovery with Semantic and Social Services.
Compared to the previous one, it is not a compilation of articles contributed by myself and my colleagues; this book is based on my thesis. The book covers most important aspects of what Semantic Digital Libraries are and what is that they can offer. I present a very thorough review of literature describing various advanced digital library projects and components. Based on the identified requirements I propose architecture, data model and classification of ontologies for semantic digital libraries. I describe two example information retrieval and knowledge management techniques, which utilize semantic web and social networking technologies. Finally, I briefly describe the JeromeDL system and provide very detailed and thorough analyses of evaluation comparing usability of end-user services offered by a semantic digital library with those offered by a classic digital library.
The book is available for purchase at lulu.com.

Since I understand that not everyone could afford buying a complete, hardcover version, I have prepared an array of different versions (see below) of this book, ranging from Hardcover through Paperback through E-Book. There is also a lite version of this book, which does not contain the attachments.
| Hardcover | Paperback | E-Book | |
|---|---|---|---|
| Full version | ISBN: 978-1-4452-7770-7 €50 |
ISBN: 978-1-4452-8243-5 €30 |
PDF €15 |
| Lite version* | ISBN: 978-1-4452-8864-2 €40 |
Lite Paperback version €20 |
ePUB** €15 Amazon Kindle $14.99+Taxes |
* NOTE: Lite version does not contain appendix
** Open standard format, compatible with, e.g., iPad DRM-enabled, compatible with Adobe Digital Editions
I hope your will find this book a good read and very helpful for your studies and work.
(Pssst, come and visit our Facebook group tomorrow: you will learn how to get the book cheaper over the next couple of days)
New “home” for Semantic Digital Libraries
Posted by skruk in Announcement, Semantic Digital Libraries on 2010/02/04
January was madness: I was hoping for this year to be more relaxed after all the rush with thesis and the company in 2009. But it did not start like one; guess I should blame my workholic attitude. Maybe I should do what my supervisor once did: hang a note (so I can see it) saying “Just say NO”.
I have been asked many times recently about materials regarding Semantic Digital Libraries but could not really point to just one location. Yes, there are all my papers at http://library.deri.ie (which happens to be down quite often ever since I left the institute), and many of my presentations are on slideshare, and than there is my book, and my thesis, and my tutorials … I realized that there has to be (finally) a place where I can gather and reference all of that.
And here it is: http://semdl.info/. At the moment you will find there all major presentations I did related to the topic, archive information about tutorials we gave (together with complete slides from most of them), and references to two of my books on Semantic Digital Libraries. Most likely I will use our infrastructure to set up a JeromeDL with all the papers on the subject and reference them there.
But, it’s not the end: I do not want it to be a one-man show. I hope that all other people that are interested in the subject will help me to fill in the site with more materials and make it alive (someone already suggested a blog ). Please let me know if you want to join the effort.
What was my thesis about ?
In case you were wondering what was my PhD thesis about – here is are some tips (thanks to Woordle):
Semantic School – after the first month
Posted by skruk in Knowledge Hives on 2009/10/20
That was a quick one. I hardly say “Semantic” and we are first month down the road with daily posts at our Semantic School blog.
We have published 21 articles so far. We went from reviewing Semantic Layer Cake to introducing OWL yesterday.
It is to early to tell, but we already heard voices of appreciation. Hopefully, over the next month or so we will gather and teach a large community of people interested in semantic technologies.
If you have not been to Semantic School so far, be sure to visit http://www.semanticschool.com/
My research and knowledge workers
Posted by skruk in digime, Knowledge Hives, Research on 2009/09/23
Just had a very interesting conversation on how my research could help knowledge workers, e.g., people responsible for producing documentation in large corporations.
I have to say, I am glad that whatever I wrote here helped someone make this association with their work to my research. To be frank, when I invented notitio.us 2 years ago, knowledge workers were among those people I was thinking about.
My PhD Viva Voce on Video
Posted by skruk in Corrib, DERI, Knowledge Hives, Semantic Digital Libraries, thesis on 2009/09/18
Few days ago I found enough time to write about the last stages of my PhD process, namely my viva voce and what followed afterwards.
A friend of mine, Lukasz, told me that he still had the videos he recorded during my presentation. He was so kind to publish them on Vimeo.
Here they are:
Sebastian’s PhD Deffence 1 from Lukasz Porwol on Vimeo.
Tutorial on Semantic Digital Libraries at ICSD’09
Posted by skruk in Knowledge Hives, Semantic Digital Libraries, thesis on 2009/09/17
Last week I have travelled to Trento, Italy.
I was invited by the organizers of the International Conference for the Semantic Web and Digital Libraries (ICSD’09) to give a full-day tutorial on Semantic Digital Libraries.
The key goal of this conference is to bring together researchers and practitioners working on solutions that span together these two worlds: Semantic Web and Digital Libraries. Even though these two research lines have so much in common, getting a joint mind set proved to be quite bit of a problem.




Recent Comments