By Filip Salomonsson; published on January 10, 2010.
Vecka 1 börjar ta slut. Förutom kundmöte, planering, och en del kodande för nöjes skull fick jag även lite tekniska problem. Ett avbrott på min vanliga internetanslutning tydliggjorde att det inte håller att stå utan alternativ i såna situationer.
Telenor Mobilt Bredband Kontant verkar så här långt som en bra backuplösning. 500 spänn för modem och sim-kort, och sedan kan man när som helst betala 90 kr för att ha uppkoppling i en vecka. Det verkar funka hur bra som helst, även under Linux, så det får bli veckans tips till egenföretagare.
Veckans bildminne blev ansiktsuttrycket på Telenor-säljaren när han just lovat 6 MBit nerströms och 1 Mbit upp, och jag pekade på kartongen han just gett mig där det stod 3 MBit ner och 0,3 upp. Säljare i telebranschen har ofta en avslappnad inställning till sanningen. Ett klart imageproblem.
Jag tröttnade också på att det är så förbålt meckigt att komma åt sina "favoriter" på Hemnet när man kommer in via ett bevakningsmail eller -RSS, och mockade upp ett exempel på hur de skulle kunna göras mer lättillgängliga direkt från andra bostadsbeskrivningar. Det ligger ett separat blogginlägg och gror där inne någonstans.
Söndagslänkar
By Filip Salomonsson; published on January 03, 2010.
2005 började jag undervisa i språkteknologi på Uppsala universitet. Sedan 2007 har jag jobbat i ett forskningsprojekt där. I torsdags tog min anställning slut.
Nu är jag egen företagare på heltid, och hjälper företag att vara så värdefulla som möjligt för sina kunder på webben. Jag behöver slipa på min elevator pitch, men tills vidare får vi nöja oss med det där. Det handlar om att göra webbtjänster mindre irriterande och mer hjälpsamma.
Det här är vecka 0, som jag avslutar med att tömma inkorgen och planera för vecka 1.
Söndagslänkar
Tömma inkorgen, var det. Hugaligen.
By Filip Salomonsson; published on September 03, 2009.
Tags: bugs profiling python
Consider a very small python program, test.py:
label = "foo"
And then consider profiling that program with the very nice cProfile module:
$ python -m cProfile test.py
Finally, consider the consequences:
Traceback (most recent call last):
File ".../lib/python2.5/runpy.py", line 95, in run_module
filename, loader, alter_sys)
File ".../lib/python2.5/runpy.py", line 52, in _run_module_code
mod_name, mod_fname, mod_loader)
File ".../lib/python2.5/runpy.py", line 32, in _run_code
exec code in run_globals
File ".../lib/python2.5/cProfile.py", line 190, in <module>
main()
File ".../lib/python2.5/cProfile.py", line 183, in main
run('execfile(%r)' % (sys.argv[0],), options.outfile, options.sort)
File ".../lib/python2.5/cProfile.py", line 36, in run
result = prof.print_stats(sort)
File ".../lib/python2.5/cProfile.py", line 81, in print_stats
pstats.Stats(self).strip_dirs().sort_stats(sort).print_stats()
File ".../lib/python2.5/pstats.py", line 92, in __init__
self.init(arg)
File ".../lib/python2.5/pstats.py", line 106, in init
self.load_stats(arg)
File ".../lib/python2.5/pstats.py", line 130, in load_stats
arg.create_stats()
File ".../lib/python2.5/cProfile.py", line 92, in create_stats
self.snapshot_stats()
File ".../lib/python2.5/cProfile.py", line 100, in snapshot_stats
func = label(entry.code)
TypeError: 'str' object is not callable
(File paths shortened because mine are horribly long.)
Now consider stabbing your heart out with a fork. Though perhaps I should see if I can fix it instead, and submit a patch.
By Filip Salomonsson; published on June 18, 2009.
Tags: python
Or, a random act of senselessness (which is a nice word).
>>> class s(str):
... def __sub__(self, other):
... return "".join(chr(c) for c in range(ord(self), ord(other)+1))
...
>>> s("a") - s("g")
'abcdefg'
Never do this sort of shit. Thank you.
By Filip Salomonsson; published on May 10, 2009.
Tags: python streamxmlwriter xml
I just uploaded streamxmlwriter 0.2 to PyPI.
Streamxmlwriter is my library for flexible size-independent XML writing, including pretty-printing and custom attribute sorting. Try it out (both easy_install streamxmlwriter and pip install streamxmlwriter should work) or dig through the source code on GitHub.
Namespace support is still experimental, and the documentation is a bit on the thin side, but you should be able to use it for Real Work. (I do.)
I'll show it off in a post of its own when it's a bit more mature.
By Filip Salomonsson; published on May 10, 2009.
Tags: elementtree lxml python xml
The ElementTree API makes XML processing in Python a breeze, and the
iterparse function alone can probably handle 80% of your XML
processing needs. I love it.
But did you know you can lose data with it if you're not careful?
Don't worry - it's not a bug, but there are edge cases you should be
aware of.
The problem
The documentation is clear:
iterparse() only guarantees that it has seen the ">" character of a
starting tag when it emits a "start" event, so the attributes are
defined, but the contents of the text and tail attributes are
undefined at that point. The same applies to the element children;
they may or may not be present.
If you need a fully populated element, look for "end" events instead.
As a rule, you should only use start events to inspect and/or modify
the element's tag and its attributes.
You probably knew that already.
If you follow the link from Fredrik Lundh's iterparse
page to a python-sig message from 2005, you'll see
something that may not be as well known: the availability of the
tail attribute during end events isn't guaranteed either.
You may not have known that.
The suggested remedy for the text attribute is simple: only touch it
on end events. In most cases, you never even look at start events
anyway, so that's a fine solution.
But what about tail? It's very rare that I ever use xml documents
that has tail data, but when I do, this is an important issue. To be
sure not to lose data, you'll have do something about it.
Luckily, there's a simple solution, but first, let's look at why this
happens.
The cause
It all has to do with how the parsing works.
iterparse feeds data to the parser in 16-kilobyte chunks, and it
fires off all events it can for each chunk. Then the events are handed
over to you, one by one.
Say there's a foo element whose contents is the text "hello".
...<foo>hello</foo>...
As long as all of the text is in the same chunk as the preceeding ">", the text attribute
will be set during the start event. We can try it out:
>>> import xml.etree.cElementTree as etree
>>> from cStringIO import StringIO
>>> doc = StringIO("<doc><foo>hello</foo></doc>")
>>> for event, elem in etree.iterparse(doc, ("start", "end")):
... print event, elem.tag, elem.text or ""
...
start doc
start foo hello
end foo hello
end doc
On the other hand, if a chunk ends in the middle of that text (or
immediately after the start tag, before the text), iterparse will
hand you a start event for the foo element without the text
attribute set, and the parser comes back and sets it when it's
processing the next chunk and reaches the end of the element.
|
...<foo>he|llo</foo>...
|
Let's trigger this by adding a long comment before the foo element.
>>> padding = "x" * 16365
>>> doc2 = StringIO("<doc><!--%s--><foo>hello</foo></doc>" % padding)
>>> for event, elem in etree.iterparse(doc2, ("start", "end")):
... print event, elem.tag, elem.text or ""
...
start doc
start foo
end foo hello
end doc
Now the chunk ends after "he", and this time the foo element's
text attribute isn't set during the start event.
The issue with tail is exactly the same. We can trigger this by
using a long comment again. This time, we'll use an empty foo
element. The first chunk now ends after the "h" in "hello".
>>> doc3 = StringIO("<doc><!--%s--><foo/>hello</doc>" % padding)
>>> for event, elem in etree.iterparse(doc3, ("start", "end")):
... print event, elem.tag, elem.tail or ""
...
start doc
start foo
end foo
end doc
No tail text to be seen.
The solution
Both text and tail data ends when another start or end tag occurs.
Both of these trigger new events, so we can use a wrapper that stays
one step ahead, making sure the next event has always been triggered
before it let's us see the current one.
Here's our "delayed iterator":
def delayediter(iterable):
iterable = iter(iterable)
prev = iterable.next()
for item in iterable:
yield prev
prev = item
yield prev
Let's try it out on the last two examples above.
>>> doc2.seek(0) # "rewind" the stringio object
>>> context = etree.iterparse(doc2, ("start", "end"))
>>> for event, elem in delayediter(context):
... print event, elem.tag, elem.text or ""
...
start doc
start foo hello
end foo hello
end doc
>>> doc3.seek(0)
>>> context = etree.iterparse(doc3, ("start", "end"))
>>> for event, elem in delayediter(context):
... print event, elem.tag, elem.tail or ""
...
start doc
start foo
end foo hello
end doc
Success! This works both for Fredrik Lundh's ElementTree (which is in
the standard library since python 2.5) and for Stefan Behnel's
excellent lxml.
So, from no on, all your iterparsing should be text-safe. (With
lxml, there are still special cases where this may not quite
suffice, but we'll come back to that another time.) Happy coding!
Agree? Disagree? Found a bug? Talk back at
filip.salomonsson@gmail.com.
By Filip Salomonsson; published on October 03, 2008.
Tags: asides
Vad tyst det är här.
By Filip Salomonsson; published on June 14, 2008.
Tags: businesstobuttons inuse
Per Axbom har bloggat dagrapporter från konferensen From Business to Buttons i Malmö:
Såväl Axbom som Johan Berndtsson från inUse (och några till) har också fyllt på i FBTBs Jaiku-kanal under konferensen.
(Passa för all del också på att kolla in inUse nya webbplats, som de skeppade lagom i tid till konferensen. Aj på URLerna, men stiligt.)
By Filip Salomonsson; published on May 23, 2008.
Tags: bash unicode
Bash alias of the day. Stuff this into your ~/.bashrc:
alias visws="sed -e 's/ /\o033[37m\xc2\xb7\o033[0m/g' \
-e 's/\t/\o033[37m \xe2\x86\x92 \o033[0m/g' \
-e 's/\r/\o033[37m\xe2\x86\xb5\o033[0m/g'"
Then pipe anything to visws, and you'll get spaces, tabs and carriage returns shown in grey as sweet unicode characters (which my django-driven blog cannot show you, embarrasingly). Dots and arrows, basically.
(This will only work if your terminal encoding is utf-8. But it is, right?)
Update: To be clear, the "cannot show you" part is my fault, not django's.
Bonus: Here it is in action.
By Filip Salomonsson; published on May 03, 2008.
Tags: asides
Har du saknat mig?