Remember the sabbath day

remember_the_sabbath

I made a small Python script for checking how many Git commits have been done during the weekend days. It is a simple idea. Only mark the days when commits have been done. Don’t think too much about number of commits or how many changes have been done. It is up to us to interpret the graph, but you apply it on a project which has been going on for a very long time you can kind of read some things out of it.

Look at the picture above. Graph in the top left corner is from a commercial project. Clearly, nobody cares about it over the weekend. Percentage of activity is around 20%, which means that work has been spread proportionally.

Graph in the bottom right corner is from Django repository. More or less this is how all big Open Source projects look like with lot of participants. Work evenly spread over the entire week.

You can find out more here:
https://github.com/aerkalov/sabbath

Booktype developer’s review: The Django Circus comes to

Original post is here:
https://www.sourcefabric.org/en/community/blog/1878/Booktype-developer%27s-review-The-Django-Circus-comes-to-town!.htm

If I only knew how awesome the Django community is, I would have started attending DjangoCons way earlier. This year the European DjangoCon was held in Warsaw from the 15th to the 19th of May and believe it or not, it was in a Circus tent! It was organised by members of the local Django community and they have really raised the bar for the next DjangoCon in France. Because Django is a central programming language for Booktype I decided to go and check the action from this community of developers and aside from learning a lot of cool new tricks I also had an amazing time!

Fun in the sun

The first three days of the conference were organised at the horse track, next to a weirdly big fountain and an old swimming pool. There was only one track at the time and that proved to be more than enough. The tent was open to the outside providing good airflow and easy access to the lectures. It was outfitted with power, WiFI, projectors, tables for those who can not escape their work and even a refrigerator filled with cold water and drinks. During the day people could easily grab coffee, fruit, sandwiches, homemade energy bars, ice-cream and popcorn in the smaller tent. The organisers provided us with hammocks, deckchairs, bean bags, blankets, frisbees, badminton rackets and many other toys. Many times it just felt like we were at a music festival and not a developers conference.

Django experts visit the festival

But it was not just about the food and frisbee. The event was packed with interesting people and projects, and this is what conferences are all about, meeting new people. Russell Keith-Magee (President of the Django Software Foundation), Andrew Godwin (Django and South), Tom Christie (Django REST Framework), Kenneth Reitz (Requests and Python Software Foundation), Steve Holden (Python Software Foundation), Zed Shaw and Aymeric Augustin (Django) were all present, just to name a few.

You could feel the spirit of open source and the Django community at the festival. Everyone was more than friendly and very approachable. I guess all the stories about how the Django and Python have one of the friendliest communities are true after all! The best place to see this was during the last two days of the conference. The coding sprint was organised at the Gamma Factory and the first day it was attended by more than 200 people. Instead of junk food the organisers provided us with more healthier meals. But enough about the food. Members of the Core Django team and developers who have already contributed to Django were more than helpful and patient with the newbies and people who just wanted to help with coding, documentation, testing or sometimes just organising old tickets in the system.

It was a very positive and enriching experience. I encourage you to think about participating at DjangoCon next year or just visit one of the local Django/Python gatherings in your country. You will not know how stimulating it can be until you try it!

As they say, a picture tells a thousand words! Check out the video from the festival.

Booktype Easy Install

A lot of people seems to have problems installing Booktype. Until we have native packages for each distribution installing vanilla Booktype from git repository will be the way to go.

I love all of my users the same and that is why I decided to work on a small install script which would do most of the steps for them. This script should help you even when your distribution does not provide valid native packages (Old Django packages and etc…). Anyhoo, I have done some basic testing on – Ubuntu 10.04, Ubuntu 12.04, CentOS 6.3, Debian 6 and I have tried to avoid combination of outdated system packages and latest packages installed with PIP (because that is what Google told you to do). I am aware there are many places where this script could go wrong but i have tried to fix at least basic problems. For instance, if you are on CentOS it will warn you and inform how to install EPAL repository.

Like all “double click” methods this script will also install unwanted packages. To avoid conflicts with different versions of python modules it will create virtualenv environment and install only those Python packages (and their versions) you really need. It will install packages using python package tools and not system packages. For some of the packages it means compiling. Compiling means having development tools on the system. Be aware of that.

Created Booktype instance will use Sqlite to store data. Sqlite is great database but it does not support easy upgrades of schema. If you are serious about Booktype you might want to look at PostgreSQL. After all, this is version 0.0.1 and we only support built in python web server. To make it work with Apache you will need to manually activate virtualenv environment in wsgi script. In the future install script will do this for you.

Where to get it

For now, Booktype Easy Install script is part of “Booktype Scrolls” project and you can get it here:
https://github.com/aerkalov/booktype-scrolls/tree/master/scripts/install.

How to start it

Download the script and execute it. It will download and install required packages. When it is done it will tell you how to create user account and start Booktype. You don’t need to be root to start this install script but you do need sudo permissions in case it must install some system packages. You will need to confirm installation of new packages and you will be informed which commands are being executed in the background. Without your permission new packages will not be installed on the system. Feel free to analyze the script before you start it.

wget https://raw.github.com/aerkalov/booktype-scrolls/master/scripts/install/booktype_install.py 
python booktype_install.py

Use –help to get all available arguments:

python booktype_install.py --help
python booktype_install.py -p myproject
python booktype_install.py -o ubuntu -p book

What now

It needs testing and support for more platforms. Please report to me if you have problems with it. Will try to fix it together!

Booktype scrolls – Importing WordPress site!

So…you left your job and went traveling throughout Asia for 4 months.  In small cyber cafe, somewhere in south Pakistan, you created WordPress site because you were lazy to send emails to friends and family. Internets loved you and your pictures were on the front page of Reddit many many times. Who could forget that great article about food poisoning in China or that unfortunate misunderstanding with gentleman in Bangkok…

When you got back everyone was like – “You should write a book!” and you said to yourself  – “Why not… Half of the book is already written.”. This brings us to the subject of this post – importing WordPress site as a Booktype book.

Importing

There are couple of ways you can do it (this can be applied to other systems also):

  • Use some kind of API to remotely access data. For this you only need permission to access data.
  • Write your own plugin to export data from WordPress. Besides knowing how to write plugin you need permissions to install it on the remote server. You can use WordPress API to render the articles properly, something you can’t really do with other 2 methods.
  • Finally the method we are using in this post. Parsing XML export file created with WordPress export plugin.

Importing data can be complex process and each method has its good and bad points. You probably don’t want to have chapter for each post in WordPress because that could be hundred and hundred of chapters at the end. Maybe you want to choose which chapters you want and sometimes to combine couple of posts as one chapter. These are just some of the issues.

How it works

As it is visible on the picture above you should go to Tools/Export and choose what you want to export. Download the “Export File” to you computer. Export file is just extended RSS file and you can (and should) use it to export content of your site. Notice that attachments and images are not part of this export file, we will need to download them separately! Besides content is in “raw” format (before everything was rendered to HTML).

Like I said, export file is extended RSS file and for parsing we use this great library feedparser. Book name can be specified with arguments but for the default name we use title of the WordPress site. Booktype cares about two titles. One is the full title of the book and the other one is unique url name of the book (usually slugified version of full title).

from booki.utils.book import createBook

book = createBook(conf['user'],
                  conf['bookTitle'],
                  status = 'new',
                  bookURL=conf['bookTitleURL'])

As you can see we use function createBook. It takes user object as first argument (someone has to be owner of the book), full book title, default book status (books can have different statuses) and slugified title name. Export file provides us with the information about WordPress administrator and Post author. We could use that information to find or create Booktype user with specific info and set that person as Book owner.

Parsing posts

After this we just go through the list of all the posts in export file and ignore those who don’t have ‘wp_status’ set ‘publish’. Like i said, content of that chapter is in “raw” format. For instance empty lines in text are representing start of new chapter and etc. If we were using specific tags to format our code (for instance, to prettify the source code) we would not get nice and colorful HTML.

In this example we just do two modifications. One is to create paragraphs in text and the 2nd one is to put Chapter title inside of H2 tag. That is one of the Booktype requirements at the moment.

content = "\n".join(["< p >%s< /p >" % p for p in content.split('\n\n') if p.strip() != ''])
content = u'< h2 >%s< /h2 >%s' % (chapterTitle, content)

And yes… I have some extra spaces in tags P and H2 because i am too lazy to figure out how to escape it in code prettifier plus the code to make paragraphs is not perfect (but this is just an example).

Images

Now we parse Post content and search for the images. When we find one, we try to download it to our computer. Normally we wouldn’t need to do it, but Booktype really wants to have entire content of a book locally. When the image is successfully downloaded we save it as an attachment and we modify image location. All images must be placed inside of ‘static/’ directory. There are good reasons why it has to be relative path, but we will not talk about it now.

att = models.Attachment(book = book,
         version = book.version,
         status = stat)

f2 = File(StringIO(data)) f2.size = len(data) att.attachment.save(fileName, f2, save=True)
fileName = os.path.basename(att.attachment.path)
e.set('src', 'static/%s' % fileName)

Chapter

At the end we just create new Chapter. This is very basic way of creating new chapter because we are not leaving any traces in our logs and etc… We can leave that for some other example. Also notice we are not doing anything with links to other posts/chapters on the same site/book. We just ignore that for now.

chapter = models.Chapter(book = book,
                         version = book.version,
                         url_title = bookiSlugify(chapterTitle),
                         title = chapterTitle,
                         status = stat,
                         content = content,
                         created = now,
                         modified = now)

chapter.save()

Source

Look at the full source for WordPress importer in my Booktype Scrolls repository. Purpose of this code is to be more educational and less production ready.

As you can see technical part of importer is very simple. Thumblr has API for remote access and it would be fairly simple task to make Thumblr importer (or for any other CMS). The biggest problem is still how to make their HTML work nicely inside of Booktype. If author is using custom CSS on WordPress side then Booktype would be completely unaware of it. For instance – <img class=”alignnone size-full wp-image-591″ … It would be the same for different WordPress plugins, broken HTML, headings at wrong place, JavaScript plugins and etc.

Booktype Scrolls

I decided to put different kind of scripts, snippets and small Django apps in a separate project called “Booktype Scrolls“. The idea is to show how Booktype can be extended and how some of its functionality can be used.

For now i have WordPress importer, global search-replace and couple of snippets to get some basic statistics from the database. For 0.1.3 version it is more then satisfactory but I plan to extend it every couple of days with something new. So… stay tuned!

Links:

Friedrich Nietzsche i thumbnailovi sa images.google.com

Izgleda da je ovo tjedan 5 lajnera u pythonu. Anywho, radim neki backend za jedan hax0rski portal. U biti, radim neuspješnu reinkarnaciju Metafeeda-a u vidu naslovnice za site Razmjena vještina. Svečana prezentacija bi trebala biti na “Ništa se neće dogoditi”. Nakon toga siguran sam da će to biti vaš prvi (ako ne i jedini) feed uz koji ćete ujutro pijuckati kavu.

Daklem… jedna od stvari koja mi treba za backend portala je operacija “uzmi mi random sliku sa images.google.com”. Mislio sam da će biti nekog HTML parsanja sa BeautifulSoup ali moj dragi Google me lijepo iznenadio. Pljunu natrag gomilu JavaScripta koji izgenerira stranicu u samom Browseru. Koristeći najobičniji regular expression uzmem sadržaj tog JavaScript Arraya (kao string). Zgodno je što je u ovom slučaju sintaksa za Array polja u JavaScriptu identična Pythonovim listama pa iskoristim običan Pythonov eval da od toga dobijem Pythonovu listu. Da sam i imao želju za Djeda Mraza (iliti lika sa Coca-Coline reklame) ne bi bilo ovako lako na kraju.

Naravno, cijela ova fora će raditi dok Google ne odluči promjeniti sintaksu. Cilj skripte je pokazati ukratko kako sam rješio problem dobivanja URL-a slike (a i da popunim blog ovom trivijalijom). Zbog toga i nisam onečistio code sa provjerama u slučaju grešaka i neispravnih rezultata. Baš mi nešto i ne radi num argument za podešavanje broja rezultata na stranici. Zato koristim argument -p koji mi kaže na koju stranicu rezultata da odem.

Upotreba:

skini.py -p 4 krava muzara
skini.py -r svinjska gripa

Skripta skini.py:

#!/usr/bin/python

import urllib2, urllib, re, sys, getopt, random
def fetchURL(query, start = 0): req = urllib2.Request('http://images.google.com/images?hl=en&q=%s&gbv=2&aq=f&oq=&aqi=g10&start=%s' % (urllib.quote_plus(query), start)) req.add_header('User-Agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.1pre) Gecko/20090701 Ubuntu/8.10 (intrepid) Shiretoko/3.5.1pre') req.add_header('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8') req.add_header('Accept-Language', 'en-us,en;q=0.5') req.add_header('Accept-Encoding', 'deflate') req.add_header('Accept-Charset', 'ISO-8859-1,utf-8;q=0.7,*;q=0.7') req.add_header('Connection', 'close') r = urllib2.urlopen(req) data = r.read() return data
def parseImages(data): p = re.compile(";dyn\.setResults\(\[(.+)\]\);\<\/script\>") m = p.search(data) lst = eval(m.group(1)) return lst
if __name__ == '__main__': try: optlist, args = getopt.getopt(sys.argv[1:], 'rp:') except getopt.GetoptError, err: print "skini.py [-p <page number>] [-r] args\n" print str(err) sys.exit(-1) start, isRandom = 0, False for arg, value in optlist: if arg == '-p': start = value if arg == '-r': isRandom = True data = fetchURL(' '.join(args), start = start) images = parseImages(data) if not isRandom: for img in images: print img[3] else: print random.choice(images)[3]

Prekrasna juhica u Googleovoj zdjelici servisa

“Ej Aco..” – prekinuo me Srki u ___________ dok sam ljenčario na poslu. Naime, zanima ga… da li bi mu mogao prebaciti sadržaj “prodaja Ega/hiperinflacija Emocija” sa http://srdjansandic.blog.com/ na blogger.com. Da li bih mogao… pif….

S obzirom da Aco nije nikad odbio besplatno pivo odlučio sam složiti neku sitnu skriptu koja će mu to raditi. Naime, blogger.com (kaže Srki, nisam ja provjeravao) ne može to sam importati. blog.com nema nikakav public API za export podataka ali zato blogger/blogspot ima.

Brz na idejama, kakav već jesam, bacio sam kratki pogled na
http://srdjansandic.blog.com/. Analizirajući stranice mojim omiljenim FireBugom odlučio sam se na najkraće moguće rješenje (u međuvremenu se malo promjenio template ali u vremenu kad sam prebacivao code je imao smisla).

Daklem…. govoreći riječnikom iskusnih kiber hakera i internauta odlučio sam se za slijedeće:

ćopi_sve_linkove_na_arhivu sa strane "http://srdjansandic.blog.com/";

vrti_kroz_te_stranice { | stranica |

  ćopi_sve_linkove_na_perma_link sa stranica;

  -- LoL !
  vrti_kroz_te_stranice { | stran1ca | 
      ćopi_sadržaj_stranice stran1ca;
      izvuci_iz_dom [title, datum, content] -> snimi_u_bazu;
  }
}

Naime, šema je slijedeća. Ćopim naslovnu stranu. Na naslovnici se nalaze linkovi na stranice sa arhivama. Uzmem te linkove i skinem svaku od tih stranica posebno. Tamo se nalazi popis svih postova (lako je jer je sve na jednoj stranici) sa dijelom sadržaja. Vrtim kroz svaki od tih postova i uzmem linkove na koje pokazuje link na njihov permalink. Uzmem svaku od tih permalink stranica i iz nje izvučem sadržaj naslova, datum objave i sadržaj. Može i komentare na taj način ali Srki je rekao da nema potrebe za tim.

Zatim dobiven sadržaj spremim negdje (u mom slučaju sqlite3). S obzirom da blogger ima neku zaštitu od SPAM-anja podnosi samo određeni broj postova u danu. Preko Web interfacea se može postati više ali se mora popuniti neki CAPTCHA. Preko GData API-a nisam vidio da ima neka opcija za to pa zbog toga svaki dan opalim nekih 50-etak novih članaka iz baze.

U svemu ovome koristio sam Beautiful Soup. Zakon library koji mi je do sad dosta puta uskočio u pomoć. Kuži loš HTML, lako se izvlači sadržaj iz njega i brine se za encodinge. To im čak piše i na naslovnoj stranici 🙂 Iz nepotpunog primjera koji slijedi (koji nisam ni isprobao) je vidljivo kako se koristi Beautiful Soup i kako sam ja primjenio gornji algoritam na Srkijevom siteu.

import urllib
from BeautifulSoup import BeautifulSoup

def fetchURL(url): return BeautifulSoup(urllib.urlopen(url).read(), fromEncoding="utf-8")
soup = fetchURL("http://srdjansandic.blog.com/")
for link in soup.find('div', {"class":"block slideshow"}).findAllNext("a"): archive = fetchURL(link["href"])
for post in archive.find('div', {"id": "posts", "class": "posts"}): for plink in post.findAllNext("span", {"class": "permalink"}): clanak = fetchURL(plink.next["href"])
title = clanak.find("h4", {"class": "posttitle"}).next datum = clanak.find("h3", {"class": "date"}).next text = clanak.find("div", {"class": "posttext"}).next

Fali dio koji izvlači podatke iz baze i preko GData python clienta posta na blogger/blogspot.

Mali vizualni dodatak autićima


Čekao sam s ovim dodatkom da otac unese određenu količinu modela u sustav. Ima nekih krivo unešenih naziva i duplikata ali to ću s vremenom ispraviti. U svakom slučaju sad na slikama postoji mali žigić sa logom, nazivom, veličinom te URL-om od kataloga.

Slikica se dodaje automatski kod uploada slike ili dodaje putem django management skripti. Malo sam se sa izgledom igrao jedno vrijeme ali ova boja pozadine, količina prozirnosti i mali linkić mi djeluju zadovoljavajuće.

Ovaj katalog nije ništa specijalno (što i dalje ne znači da bi ga svaka šuša mogla napraviti) ali za ekipu koja se bavi skupljanjem modela automobila ovo je i dalje svjetlosnim godinama bolje od bilo čega što oni imaju ili će imati u skorije vrijeme. Postoji neka šansa i zainteresiranost da i drugima ponudim hostanje svojih kolekcija na katalogu ali za to nemam interesa ni vremena. Ako je netko zainteresiran pokrenuti usko specijalizirani katalog modela automobila/aviona/brodova/kućica/vlakova eto mu ideje. To je ekipa koja će rado platiti godišnje neke novce da stave sve svoje autiće online.

http://autici.binarni.net/

Kako Aco bira prijatelje

Na oglas o novim prijateljima je bilo različitih reakcija. Od uvrijeđenih poznanika do ideja da je moj stan javna igraonica. U svakom slučaju, kao i svaki majstor i ja volim iskoristiti alate koje svakodnevno koristim da mi olakšaju život. Stvar je dosta modularna tako da se može lako proširivati sa novim Acinim uslovima. Podrška za Profile (Aco je veseo, Aco je depresivan, Snjeg pada vani i Aci je milo) bi svakako dobro došla.

Kroz ovaj priority queue prođu svi.


from heapq import heappush, heappop
import datetime

DOBRO, VALJA, JEBENO  = 0, 1, 2
NECE, HOCE = False, True
MUSKO, ZENSKO = False, True

ACINE_GODINE  = datetime.date.today().year-1976
FAKTOR_GODINA = -20

acin_um = {"vrckast":  lambda val: -2,
           "stan":     lambda val: val == True and -10 or 1,
           "auto":     lambda val: val == True and -5 or 0,
           "star":     lambda val: round(abs(ACINE_GODINE-val)/100.0*FAKTOR_GODINA),
           "wii":      lambda val: val == True and -5 or 0,
           "udan":     lambda val: val == True and 10 or 0,
           "zauzet":   lambda val: 2,
           "spol":     lambda val: val == True and -10 or 0,
           "ima_maca": lambda val: 0,
           "sise":     lambda val: (-2,-5,-10)[val],
           "guza":     lambda val: (-2,-5,-10)[val]
           }

def create_prijatelj(name, **kw):
    return (reduce(lambda a, b: a+b, map(lambda opc: acin_um[opc](kw[opc]) , kw.keys())), name)

potencijalni = []

heappush(potencijalni, create_prijatelj("osoba jedan",
                                        sise    = JEBENO,
                                        guza    = VALJA,
                                        stan    = False,
                                        vrckast = True)
         )

# itd.. itd... itd..

heappush(potencijalni, create_prijatelj("osoba n",
                                        wii     = HOCE,
                                        udan    = True
                                        )
         )

print "* drum roll *"
print "Najbolji kandidat za acinog novog prijatelja je: ", heappop(potencijalni)[1]

Goran Zec protiv blogova osovine

Ograda

Znači ovo je samo šala i ja znam da mi Goran neće ništa zamjeriti a za sourceove odskrolajte preko prvih par paragrafa…

Kako je sve počelo

Pojavio se novi super heroj u gradu i kako stvari trenutno stoje to nije “Aco Pretnja”! Postoji neka anketa http://www.galoviceva-jesen.com/blog.asp za najbolji blog (što god). Veseli i razdragani kandidati latili su se svojih web 2.0 social siteova i počeli nagovarati svoje vjerne čitatelje da im podare koji glas. Sad tu stvari postaju malo zanimljivije. Kako je bilo i sa onim srednjoškolskim demonstracijama i ovaj put je facebook napravio sranje!

Kao što se na ovom screenshotu friendfeeda vidi Rusulica je koristeći twitter svojim vjernim subscriberima poslala link gdje mogu za nju glasati. Oni naravno kreću sa klikanjem! Priznajem, i ja sam dao svoj glas. Vjerni sam čitatelj rusuličinih tekstova, subscriber videa na youtubeu i njenog emo streama na last.fm-u. Anywho, Goran kuži…. Goran nije od jučer i on je odmah pokušao u svojem Firefoxu isključiti kolačiće ne bi li podario koji glas više našoj Rusulici. To mu na kraju i uspjeva ali jedna mu stvar upada u oči. Glasovi za Srđana se povećavaju prevelikom brzinom!

Prosječna brzina klikanja

Goran je u tom biznisu dosta dugo i on zna da PBK (prosječna brzina klikanja) ne može biti ovako velika. Kao Power user Firefoxa (koji ima nekih 30-etak pluginova instaliranih na sistemu) on uzima iMacro i zadaje nizove komandi svojem Firefoxu da automatizira sam proces klikanja. Nabija Rusulici dodatnih 1300 bodova….. Ajmo Rusulica!

Izmjerimo mu glavu!

Tko je taj Srđan i kad je on postao osovina blogova (očigledno aludirajući na Bushove osovine zla)? Srki je “gel velike glave” koji honorari za 12 kuna u leglu NGO mafije. Bogu troši dane surfajući na desku, piše blog, chata na silnim IM-ovima, dopisuje se preko Facebooka. Isto kao i Rusulica pozvao je svoje frendove preko Facebooka da glasaju za njega.

A očeš ti glaaaasati za mene

Tu se sad dešava nekoliko stvari. Srki je dok se to sve počelo zahuktavati radio u m.a.m.i. Radio u m.a.m.i. koja je u tom trenutku bila prepuna “gelova” istih kao i on koji su došli na jedno od događanja na “Queer Zagreb” koje se dešavalo u mami. Mnogi od njih poznaju Srkija i mnogi od njih su na njegov nagovor prošetali do kompa i kliknuli jednom za Srkija. Isto tako, Srki je svim svojim online (znači 100 najmanje) prijateljima rekao da glasaju za njega. Sigurno je otišao i na brojne dating siteove da proširi vijest o svojem nebeskom uspjehu sa glasanjem a ne bi se začudio ni da je iskoristio službeni telefon u mami da pozove sve svoje kontakte u adresaru i uputi ih na časni čin glasanja! Pravi aktivizam na djelu. Da se toliko trudio kad su Teu trebali glasovi za “Pravo na grad” ne bi sada bilo radova na cvjetnome.

Što učini!

Uvijek sam ja govorio da će nas pederi doći glave, ali me malo ljudi sluša. U svakom slučaju, Srkijevi discipliniraniji poznanici učinili su svoje. Nekome je taj mali skok u glasovima mogao značiti samo jednu stvar. Neko hakira! Postanje svoje gole ženske guzice na blog je dobar način da ti ljudi prate blog ali katkada i nedovoljan razlog da se toliko okupe oko neke online ankete…. Anywho, Goran je napuco 1300 glasova, onda su drugi koji su skužili njegovu nečasnu rabotu napucali ostalim isto toliko glasova a sad Goran u maniri drugog razreda srednje škole i dalje uvjeren u bjelosvjetsku blog zavjeru napucava Rusulici dodatne tisuće i tisuće glasova….. O Crni Gorane! ŠTO UČINI CRNI GORANE!

Što bi…

Kao netko tko je u svom životu radio ovakve sustave za procesiranje online anketa (i sličnih online stvari) moram priznati da je ovaj problem zaslužio da se o njemu malo kaže riječ dvije.

Recimo da se radi o nagradnoj igri i recimo da je jako bitno da se pokuša napraviti nagradna igra ili nekakvo glasanje što je moguće regularnije. Osloniti se samo na cookije nije dovoljno. Koji god ASP-eaš je radio ovu nagradnu igru koju je Goran unakazio učinio je katastrofalnu grešku. Ok, možda na Internet Exploreru treba klikati po nekim ne toliko dostupnim mjestima u opcijama pa za to ljudi i ne znaju ali isključiti podršku za cookije je brz način za glasanje u nedogled. Napisati kratku skriptu koja će to raditi za vas je još lakše.

Ako se proba napraviti ograničenje “jedan glas == jedan IP” dolazi se do problema “što sa ljudima iz npr. nekog cyber caffea”. Što sa 3 člana obitelji koji bi preko lokalnog internet providera htjeli odvojeno glasati. U našem slučaju bi to značilo da Srki iz maminog cyber caffea (čitaj mjesta gdje honorari) može dobiti samo jedan glas. To definitivno nije dobro. Da sam ja radio
anketu na http://www.galoviceva-jesen.com/blog.asp definitivno bih stavio ograničenje da sa jednog IP-a u određenog vremenskom intervalu može doći određen broj glasova. Znači: glaso si sad pa možeš opet za n minuta ali isto tako ne možeš sa tog IP-a baš da mi glasaš 100 puta u zadnjih m sati. S obzirom da sam i ja honorario u mamu taj mamin IP prema van i glasanje visoko pozicioniranih džabalabatora mi je ostalo u pamćenju 🙂

Naravno, to ovisi i od konkretne situacije. Ako je broj hitova na site 5 u sekundi i svi glasaju za istu stvar više je nego očigledno da netko fakea. Tako da se isplati ograničiti broj glasova za određeni item (sa svim IP-eva). Koliki? Ovisi od konkretne situacije.

Da ja sad moram napraviti neko glasanje osim svega ovoga stavio bih definitivno i jedan Captcha. To definitivno otjera away jeftine pokušaje fakeanja ali na žalost ne štiti ništa od “ako ti je stvarno stalo daš $50 dolara indijcima i oni klikću cijeli dan”.

A kako bih ovo

Da sam se želio baviti ovom rabotom i ovom konkretnom anketom radio bih to ovako. Ovo nije ništa komplicirano i ne zahtjeva neko prčkanje sa Firebugom pa bih zato samo otišao na stranicu i pogledao source. Fino kaže da radi POST methodu na url /anketa/default.asp?act=1&pid=4. Radi se neka nebitna validacija na onSubmit. Postoje tri “radio buttona” koji se zove answer. Moguće vrijednosti su “11”, “12”, “13”. Pogrešno je stavljeno da su svi “checked” što znači u ovom slučaju da je Rusuličin blog automatski selektiran kao defaultni. Pa ako radite svoje ankete pazite da ne radite ovakve greške.

<form name="frmpoll" id="frmpoll" method="post" action="/anketa/default.asp?act=1&pid=4" onsubmit="javascript: return PollVoteFormValidate();"><b>Ocjenite najbolji blog</b><ul id="answers" class="ulist"><li><input name="answer" checked type="radio" value="11" />Garden of Arcane Delights</li><li><input name="answer" checked type="radio" value="12" />Prodaja Ega-Hiperinflacija emocija by "Srdjan Sandic"</li><li><input name="answer" checked type="radio" value="13" />Rusulica</li></ul><input type="submit" class="button" title="glasaj" value="glasaj" /><br/><a href="/anketa/default.asp?pid=4">rezultati</a><br /><br/>glasalo je <strong>10527</strong> osoba<br />glasanje do: <b>10.10.2008</b></form>

Ja nisam haker pa ne bih koristio neke pluginove već bi to zdravo seljački napisao npr. ovako:


wget --no-cookies --post-data 'answer=11' "http://www.galoviceva-jesen.com/anketa/default.asp?act=1&pid=4"

Gornja linija onom trećem blogu koji se ni kriv ni dužan našao ovdje napucava dodatne glasove. Stavite to u neku petlju, pokrenite na nekoliko različitih kompjutera u isto vrijeme i vojla!

Ok.. a sad, što bi bilo kad bi bilo da su ovi npr. kontrolirali da sa jednog IP-a može samo jedan glas ili da može samo n glasova u sat vremena….

Ima nekoliko načina kako se ovo može izvesti ali evo jedan brutalno banalan. Njegova velika prednost je što možete kontrolirati s kojih adresa ćete napadati. Jer ako se radi o glasanju za hrvatski blog godine lako se shvati da 1000 glasova iz Kazahstana pomalo smrdi.

Znači tajna je u anonimnim HTTP proxijima. Odite na google i skinite si neku od aktualnih listi proxija. Nekih 1000 recimo i skopirajte si u ovu skriptu.

import os, random

SERVERI = """218.249.12.133:8080       anonymous proxy Oct-03, 14:49   China
88.191.60.104:3128      anonymous       Oct-03, 14:46   France
... itd itd itd ..."""

LISTA = [x.split("\t")[0] for x in SERVERI.split("\n")]


n = 0
# daj 100 glasova nabij
while n < 100: 
    # daj mi random proxy iz cijele liste
    proxy_server = random.choice(LISTA)

    # slozi wget komandu
    #  - postavi http proxy
    #  - neka snima output u /dev/null
    #  - neka timeouta nakon 10 sekundi
    #  - u slucaju timeouta ne pokusavaj ponovo
    #  - ne koristi cookije
    #  - postaj datea koji daje dodatni glas onom prvom
    #    bezveznom blogu

    wget_command = "http_proxy=\"%s\" wget -O /dev/null --timeout=10 --tries=1 --no-cookies --post-data 'answer=11' \"http://www.galoviceva-jesen.com/anketa/default.asp?act=1&pid=4\" " % proxy_server

    result = os.system(wget_command)

    # brisi sa liste poznatih proxy servera
    # ako je timeoutao
    if result != 0:
        LISTA.remove(proxy_server)
        print "Removing ", proxy_server, " from the list."
    else:
        n += 1

Ne može biti jednostavnije. Ono što se isplati napraviti je npr. pobacati listu ovih proxya u neku sqlite bazu. Jer sad nakon svakog pokretanja on pokušava
sve http proxy-e iz početka. Staviti u bazi, staviti potencijalnu pauzu između svakog slanja, staviti da jedan proxy može koristi svakih n minuta, staviti da različiti procesi mogu (znači pokreneš 100 puta) pristupati tim podacima i eto ti silnih glasova iz svih država naše lijepe Europe.

Jedna od stvari na koju treba obratiti pažnju je nešto što se zove concurrency iliti istodobnost. Vašoj online aplikaciji (recimo glasanje) može u isto vrijeme pristupiti nekoliko korisnika. Ono što treba obratiti pažnju je na sve te resource koji se međusobno shareaju među različitim sessionima. Normalno to i nije veliki problem jer site ima 5 hitova na dan pa se to i ne primjeti ali kad stavite beskonačnu petlju i 5 procesa koji napadaju site u isto vrijeme ako je neki džabalabator pisao skripte (jer skripte se danas copy pasteaju sa weba i tutoriala, to svi znaju) mogli bi njegov mili uradak staviti na muke.

Kao što se vidi na slici, zamislite da u isto vrijeme dvije osobe dođu na web stranice i da glasaju. Skripta za glasanje uzima broj dosadašnjih glasova. Uvećava ga za jedan i sprema u bazu novu vrijednost. Na gornjoj slici vrijednost u bazi bi na kraju trebala biti 7 ali bit će 6. Problemi sa resoursima koji se dijele se pokazuju u zavisnosti od konkretne situacije ali kod web aplikacija to su obično rad s i samim podacima u bazi podataka ili u datotekama na datotečnom sustavu. U našem slučaju se upiše kriva vrijednost ali ono što se isto vrlo lako može desiti su svakojaki exceptioni kod pristupa već otvorenim ili zalokanim resourcima. Pa vi dajte malo vašoj skripti timeouta ne bi li skriptice sa druge strane dobile malo vremena da dišu....

Eto... toliko od Ace Pretnje.