Popular free and open source software projects
- IECapt: command line application that renders web pages using Internet Explorer's rendering engine and saves the result in an image file. Released in 2003 as only tool of its kind, downloaded over 100,000 times since.
- CutyCapt: cross-platform version of IECapt based on the Qt port of the WebKit browser engine. Included in many software repositories like those for FreeBSD and Debian, and used by organizations, from the BBC to Quantcast.
- UTF-8 Decoder: efficient low-level character encoding decoding logic with an elegant programming interface. Used in many applications and libraries, most prominently in the Adobe Flash Player.
- pngwolf: losslessly reduces the file size of PNG images by finding better scanline filter combinations using a genetic algorithm. Outperformed all competing stand-alone tools by over 1% in size on average when it was released.
<canvas> element and jQuery UI sliders to control radius and intensity parameters.
Comprehensive Perl Archive Network modules
- SGML::Parser::OpenSP: a wrapper around OpenSP, an SGML processing library originally written by James Clark. Written in C++ and Perl, it powers the W3C Markup Validator for document types pre-dating HTML5 and XHTML5.
- HTML::Encoding: determines the declared encoding of HTML and XML documents, possibly encapsulated in HTTP or MIME messages. My first CPAN module, originally uploaded in July 2001. Used by the W3C Validator and others.
- Parse::ABNF: many Internet protocols are defined as or otherwise include formal languages using ABNF grammar specifications. This module can parse such grammars as a first step to compile them into something else.
- Compress::Deflate7: the Deflate format is a compressed data format used in e.g.
HTTP. Usually the
zlib library is used, but 7-zip compresses several percentage points better. This module wraps 7-zip's implementation.
- List::OrderBy: experimental module offering functions for sorting data based on multiple keys, like sorting by a “category” first, and then by “size”, so items in a category are kept together, while putting the biggest ones first.
geo: URIs allow linking geographical locations, and this module provides an interface to them. It turned out that there already was a module for this I missed when searching for one. I'd like to merge the two; no luck so far.
- Win32::MultiLanguage: makes the
IMultiLanguage family of Windows COM APIs, used by Microsoft's Internet Explorer to deal with character encodings, charset labels, Windows code pages, and language codes available to Perl.
- Math::CMA: computes central moving averages of arrays of numbers. It is intended to work like the curve smoothing function in the ngram viewer made available by Google, as I was working with the same data set at the time.
- Geo::MedianCenter::XS: given a set of weighted geographical locations, finds the geometric median point on the earth's surface that is in the middle of the points. As a median measure, this point is robust against outliers.
- Unicode::SetAutomaton: converts a Unicode character class into an equivalent compact byte-oriented regular expression for UTF-8 encoded strings. This is needed to make efficient regular expression engines.
- Set::IntSpan::Partition: takes a set of sets of integers, and splits them into smaller sets, so that each integer is in only one set. This is useful for regular expressions and finite automata based on character classes.
- Image::PNG::Rewriter: reads and writes PNG image files with some options to manipulate the image data, like changing the scanline filters. I mainly wrote this to prototype pngwolf in a more dynamic language.
- Graph::NewmanGirvan: a port of the node clustering algorithm implementation in the Java program lonloglayout. Requires only a graph with weights as input and produces rather reasonable results. Useful for experiments.
- Acme::IEnumerable: experiment to study how to add support for lazy lists to Perl5, with an interface based on the
IEnumerable interface in Microsoft's .NET framework. Unfortunately it lacks documentation for most methods.
- AI::CRM114: wrapper around CRM114, a statistical text classifier. I used it to investigate detecting new articles on the German Wikipedia that do not meet minimum quality requirements. I hope to wrap
libcrm114 instead when released.
- WebService::Validator::CSS::W3C: an interface to the SOAP API of the W3C CSS Validator, modeled after an existing module for the Markup Validator. The W3C volunteered to maintain the module, but then did not.
Contributions to open source projects
- HTML Tidy: command line application that corrects and re-formats HTML documents. I encouraged making it a library maintained by volunteers, ended up being one of them, and most recently added rudimentary HTML5 support.
- W3C Markup Validator: adding Windows support to the once-monolithic CGI script written in Perl was one of my first contributions to the W3C. Later, among other things, I split the code into independent modules as part of
- W3C CSS Validator: validates Cascading Style Sheets. Neglected by the W3C I put it back on life support, collected known issues and frequently asked questions, and eventually modified parts of the Java code through qa-dev.
- OpenSP: SGML processing toolkit used by the Markup Validator. After restoring Win32 support and adding a few needed features and fixing a couple of bugs, I helped organize the most recent and possibly final release.
- phenny: a heavy user of Sean B. Palmer's family of IRC bots, I have contributed many bug reports, implemented some commands like
geo, and wrote Python code to repair the language translation command
- Perl XML: a long-time member of the Perl and XML communities, I am one of the administrators of the
perl-xml project on SourceForge. Among other things, I volunteered to maintain
XML::SAX::Expat when Robin Berjon gave it up.
- urlencoded: possible standard variant of the
application/x-www-form-urlencoded type used in HTML form submissions, with stricter processing rules and using
; instead of
& as parameter separator.
- nomap: formalises a network naming convention originally proposed by Google to keep networks out of well-behaved databases, like geolocation databases that map Wi-Fi networks to geographic coordinates for reverse-lookup.
- nordfriisk: registration information for language variant subtags for North Frisian dialects, making it possible to write
<span lang='frr-frasch'>Liiwer düüdj as slååw</span> as it is rendered in my "native" language, Frasch.
- cp-collation: defines a generic collation for Unicode strings that supports equality, ordering, and substring operations and would register the collation in the corresponding registry. Stalled after a mistaken negative review.
- date-parsing: defines a relaxed grammar for dates in HTTP messages to accomodate certain implementation bugs, intended as an alternative to the algorithm in RFC 6265, but ultimately not proposed.
- World Wide Web Consortium: from 1999 and ongoing, actively participated in the development and standardisation of web technologies. The public mailing list archives holds reviews, critiques, proposals, tools, and more.
- XHTML 1.0 translation: a German translation of the first edition of the “XHTML 1.0: The Extensible HyperText Markup Language” W3C Recommendation. Also for XHTML 1.1 and XHTML Basic but I never advertised them.
- CSS metadata: after Bert Bos invited me to participate in W3C's Cascading Style Sheets Working Group as Invited Expert in 2001, this was the first thing I sent to w3c-css-wg, the Working Group's member-confidential mailing list.
- DOM Level 3 Events: after Robin Berjon talked me into joining W3C's Web API Working Group as Invited Expert in 2006, this was the first Technical Report that was published with me as editor, first published on 13 April 2006.
- CSS Schema 1.0: the biggest problem with W3C's CSS Validator is that it is code-driven, for every new little Java code has to be written. This was one proposal to change that with schemas. I based a later proposal on RELAX NG.
Proof-of-concept online services
- XHTML checker: syntax validator for the now infamous HTML Compatibility Guidelines of the XHTML 1.0 Recommendation. My code has later been made available as a CPAN module by the W3C.
- DIN 1460 app: transliteration of Russian into the latin alphabet according to the German standard DIN 1460, “Umschrift kyrillischer Alphabete slawischer Sprachen”. Uses ruby annotation to visualize the mapping. More.
- SVG support: data tables with information on which SVG features are supported in various SVG implementations that can be filtered by specific SVG images; submit a file and be told which features will not work in e.g. Firefox.
- abnf2xml: converts ABNF formal grammars as they are often found in protocol specifications into a proprietary XML format based on RELAX NG to ease conversion and analysis. Can be used together with tools to extract grammars.
- Latest W3C mail: overview over the latest mails to public W3C mailing lists, noting the busiest lists and threads. The
cron scheduler runs the Perl and XSLT scripts that generate the document automatically every day.
Other published software
- agent2mbox: turns mailbox files in my mail client's format into a standard
mbox file, for archivaland to allow me switch mail clients. Required to reverse-engineer the proprietary storage format. Uses a Perl version of PyConstructs.
- musicbrainz2sqlite: Perl script that turns MusicBrainz music metadata dumps into a SQLite database using the
DBI modules. Allows custom queries without having to install a PostgreSQL database.
- XML Events polyfill: an ecmascript implementation of the declarative event registration features in SVG Tiny 1.2. Never developed past the experimental stage, it was useful to develop an understanding of the proposed feature.
- Image blur filter: MediaWiki user script that automatically blurs all content images on Wikipedia through an SVG filter, revealing crisp images on hover, allowing users to avoid distressing images. Featured in Wikipedia Kurier.
- IEQaBar: failed attempt at making a toolbar for Internet Explorer that could validate documents from within the document without external tools. Code works, but polishing required skills I lacked and I couldn't find enough help.
- Uninorm: a library written in C that implements the Unicode normalisation forms defined in UAX #15. My first independent open source project after joining sourceforge.net. Shows that I was learning C at the time.
- stylevalid: a schema-driven CSS validation library with a command-line frontend. Uses the GNOME libraries libxml2 and libcroco and the GNU Autotools. I have abandoned it right after putting the first version online.
- Regexp::Convert::XMLSchema: a Perl module that parses Perl-compatible regular expressions and helps to convert them into regular expressions supported in XML schema datatypes. Written in 2006.
- IECapt#: a C# version of IECapt I made in response to popular demand. Requests for a C# version then turned into requests for help with the
csc build process. Some versions of Visual Studio might lack the tools.
- email analysis: exploratory research into parsing and analysing plain text e-mails to re-format them which requires identifying quoted and preformatted text. There are some low-hanging fruits that can be implemented easily.
- ucdxml2sqlite: Perl script that converts the Unicode Database with information on all the Unicode characters from XML into a SQLite database using the DBI interface; and
normalize.pl using the database to implement UAX #15.
- Connect Four: implementation of the game in Scheme written in 2003. Also in Scheme are a solution to the Knight's Tour problem, a generator for Feigenbaum diagrams, and a renderer for Koch snowflakes.
- SVG Tidy: notes on eventually implementing SVG Tidy, a tool to clean up and re-format SVG documents like HTML Tidy does for HTML documents, along with code snippets. In the early planning stages since 2005, with svg-qa as host.
- W3C Heartbeat: “Each Working Group MUST publish a new draft of at least one of its active technical reports on the W3C technical reports index at least once every three months.” This SPARQL query helps checking that using RDF data.
Test suites and testing materials
- Miscellaneous Web Browser tests: how web browser tests ought to be. Minimal code, accessible results, easy to edit, almost no nonsense. Most of the tests are for the non-standard APIs
- svgtest.org contributions: tests I have contributed to Cameron McCormack's short-lived svgtest.org web site, intended to get the project going. The list archives are gone, but my submission comments are available.
- Markup Validator tests: various HTML and XHTML documents to test features of Validators, including live links to the results of various validators to allow feature comparisons and to demonstrate limiations in them.
- Validator mode detection: around 70 HTML-like documents with various corner cases that pose a challenge in automatically telling HTML and XML apart. The suite also uses various obscure SGML features.
- CSS tests: a small collection of CSS test cases that demonstrate bugs in Windows Internet Explorer 5.5 written in and untouched since 1999. Some of the tests are no longer correct due to changes in the underlying specifications.
- Randomized URI processing tests: fuzz testing can be the best way to understand complex systems and their flaws, but there are few good tools to use. Case in point,
Parse::RandGen is incompatible with more recent versions of Perl.
- PerlSAX 2.1 locator tests: parsers tend to discard information about their input they do not seem to need, good line and column positions for instance. I helped to address that in PerlSAX 2.1, through specification and tests.
Web security materials
- MFSA2009-15: Mozilla Foundation Security Advisory 2009-15 titled “URL spoofing with box drawing character”, announcing a fix for a security vulnerability that I reported in 2006, after Moxie Marlinspike exploited it in 2009.
- Opera IDN spoofing: email reporting bug
#231859 to Opera Software about an address bar spoofing vulnerability also involving international domain names. Within two days I got told that the issue has been fixed. I have yet to check it.
- XSS on google.com: email reporting a code injection vulnerability on Google's main web site to Google, Inc. in February 2005, on
http://www.google.com/url which, ironically, had another XSS vulnerability fixed a weeks earlier.
- MD5 Decryption: Usenet posting from the year 2000 demonstrating that storing
md5($password) is not more secure than storing the plain text
$password, by way of a brute force Perl script. This is still an issue thirteen years later.
Code from the 1990s
- TIE format: incomplete reverse-engineered description of the Star Wars: TIE Fighter video game. Started out of necessity in 1996 – some missions have to be patched to be fun – I still hope to finish it some day, with Internet help.
- tieinf: command-line utility that reads TIE Fighter mission files and prints out the mission goals, including the bonus goals and scores, information that is not available in-game. Primarily allowing you to skip boring mission briefings.
- xirip: based on the 1994 description of the file format, a 1997 utility that extracts instruments from the demoscene music tracker FastTracker 2
.xm music files. Interestingly, music was more “open source” back then than it is today.
- guestbok: “guestbook” would have been too long for MS-DOS, but this Turbo Pascal program is a 1997-era CGI guestbook program, which ran under “Netscape Enterprise Server 3.0b2” on my computer, but never online.
- frei: the first generally useful program I remember writing was
frei, which reports on used and available hard drive space. No copy of it has survived, this is a 1997 rewrite from scratch, simpler than the original thanks to experience.
- lgpunp: the video game Final Fantasy VII for the PC stores resources like textures and music files in
.lgp files. I analysed the file format and made a utility to extract files from the packages, initially in Turbo Pascal, later in C.
- tex2bmp: converts texture files from SquareSoft's Final Fantasy VII into Windows BMP files for easier viewing and conversion. Two months later I got Internet at home and found out tools like
tex2bmp exist already.
- MatheMatoFix: a school project for the “Informatik” course, written in 1998 in Comal. It features a simple textmode interface and implements routines like a solver for quadratic equations. I also made a Windows GUI C++ version.
- Logfile resolver: resolves IP addresses in web server log files to hostnames by way of reverse DNS lookups. For performance reasons, this had to be done offline back in the day. It could be the first Perl script I published.
Books and publishing
- XSLT: “XML-Dokumente transformieren” (ISBN 3-89721-292-7), the German translation of the book written by Doug Tidwell, published in 2002 by O'Reilly Verlag GmbH & Co. KG, had me as one of the technical editors.
- Building Accessible Websites: by Joe Clark and published by New Riders in 2002. Joe asked me to join as technical editor in Oktober 2001 and so I did, but I was cut short by being conscripted into the Luftwaffe in November.
- Die nordfriesische Sprache: “Die nordfriesische Sprache nach der Moringer Mundart” by Bende Bendsen, published in 1860, on the second language I have learned. I would like to re-publish the book. The link has notes on that.
- Microsoft Most Valuable Professional: “We are impressed by your technical skills and your willingness to share your knowledge with peers.” — Microsoft. Received the award three times so far. Comes with a MSDN subscription.
- de.comp.text.xml: In the year 2000 I proposed the creation of a German Usenet newsgroup for XML. In February 2001 the proposal was accepted in a formal vote with 195 votes for, 17 against, 1 abstention, and one invalid vote.
- Cat enigma: a regular user on what is now the freenode IRC network since 2003, Sean B. Palmer and I won the “April Fools” cryptography challenge in 2011, an evil puzzle that required many hours of work and frustration to solve.
- learn.to/quote: mail formatting etiquette was a big deal on the German Usenet, but yelling at people to “quote properly” without reference is unhelpful, so I set up this redirect making it easy to reference a tutorial with rationale.
- dciwam newcomers FAQ: posted weekly in
de.comm.infosystems.www.authoring.misc, the german web authoring Usenet newsgroup, explaining the group and offering help to new users, ever since I wrote it in the year 2001.
- Die Leiden der jungen Schweizerin: studying deletionism on Wikipedia I witnessed a new user enthusiastically contributing their first article only to find herself confused and threatened and abused minutes later.
- björn.höhrmann.de: would be my personal homepage if the Internet worked better. In a chat I once explained and insisted on how to spell my name, and then got this letter. I pay attention to localisation problems ever since.
- bjoernsworld.de: various articles on web development, like an introduction into Cascading Style Sheets, tutorials on special files like robots.txt and favicon.ico and on providing textual alternatives for images.
- dciwam.de: web site of the german web authoring newsgroup that hosts various articles I have written, like tutorials on choosing font sizes, what XHTML is, and on why the CSS specifications define pixels in a surprising way.
desperabidos.de: as part of the editorial team making our Abizeitung, a book prepared by students about school and life as they graduate secondary education, I made and maintained a companion web site. Now discontinued.
- fast graphics: I like to create visual artwork during breaks to get my mind off things, using an ancient copy of Photoshop. This is a sample of them that I uploaded to try out flickr. All freely available under a Creative Commons license.
- Validator Logos: Logos I made for the W3C CSS Validator when it looked like this. They feature ‘Woolly’, the CSS sheep, and the traditional logo of the World Wide Web Consortium (they switched to a different shade of blue later).
- Katograph: an interactive treemap written in Adobe Flex and Actionscript that visualises the category system of the German Wikipedia including article counts and article pageview counts. See the documentation in German.
- German economy by revenue: based on data published by the Federal Statistical Office of Germany, an interactive treemap that organizes the business sectors in the German economy by revenue. Useful to compare e.g. theater and cinema.
- Salaries and profits: leading up to 2007, German companies managed to dramatically reduce spending on labour and increase capital gains. This series of diagrams shows that based on projections by the Bundesbank.
- Eurobarometer map: answers to “How comfortable are you with the fact that those websites use information about your online activity to tailor advertisements or content to your hobbies and interests?” across europe; see details.
- Usenet profile: a diagram showing which Usenet newsgroups I posted to the most in 1999. I made it after learning about the evils of Usenet archives and automatic profile generation based on DejaNews data.
- Related RFCs: based on cross-references and web site traffic data from the IETF web site, clusters of Internet standards, intended to investigate automating finding relevant and related documents. Also available in tabular form.
- Causes of death: to understand public policy discussions, it is often necessary to have comparative data. This treemap compares causes of death in Germany in 2006 based on United Nations World Health Organization data.
- Wikipedia article density: an experiment with the Google Maps API, this map has a heatmap overlay that shows which geographic regions around the city of Schleswig have many geo-referenced Wikipedia articles.
- PNG filter heuristic: The PNG specification suggests a particular technique to select scanline filters. Making
pngwolf I found one that significantly outperforms it at similar cost. Would put that in a paper for a free journal.
- NFC is regular: the Unicode Normalization forms are defined through transformation and sorting algorithms. Checking if a string is normalized can nevertheless by implemented with constant memory use and a O(n) algorithm.
- Webstandards im Wandel: Interview with the c’t – magazin für computertechnik. Printed in the first issue of 2007, Håkon Wium Lie, Mathias Schäfer, and I discuss various topics around web technology and web standards.
- Mobilcom: Tomorrow-Zugang für 3500 Mark: in 1999, my first 8 weeks of Internet at home resulted in a phone bill of around 2300 EUR. I organized with other victims of the accountingerror and sought the help of the c’t.