Be Careful What You Match For, You Might Not Get It

So I ran into a really interesting issue in Java regular expression parsing while trying to work on an issue for a customer.

OpenNMS has the ability to listen for syslog messages, and turn them into OpenNMS events. To configure it, you specify a mapping of substring or regular expressions to UEIs (OpenNMS’s internal event identifiers).

The customer saw a huge drop in performance from 1.8.0 to 1.8.1. Basically the only change to the syslog daemon was a change to use Matcher.find() instead of Matcher.matches(). The problem was that they were making regular expressions like this:

foo0: .*load test (\\S+) on ((pts\\/\\d+)|(tty\\d+))

…which weren’t matching. So they changed it to put .* at the front, so matches() would get it:

.*foo0: .*load test (\\S+) on ((pts\\/\\d+)|(tty\\d+))

Upon upgrading to 1.8.1, they saw orders of magnitude slowdown. The reason is that when you haven’t specified an anchor, find has to figure out the “right” starting point for the match. In doing so, it spins a LOT, compared to matches() and its implicit anchors. It’s very expensive to scan all the way through the string, attempting to re-apply the regex, if it turns out there is no match. We figured this out this morning after I put together some benchmarks to show the differences:


regex = \s(19|20)\d\d([-/.])(0[1-9]|1[012])\2(0[1-9]|[12][0-9]|3[01])(\s+)(\S+)(\s)(\S.+)
input = <6>main: 2010-08-19 localhost foo23: load test 23 on tty1

matches = false: total time: 167, number per second: 5988023.9521
find = true: total time: 1264, number per second: 791139.2405
matches (.* at beginning and end) = true: total time: 2598, number per second: 384911.4704
find (.* at beginning and end) = true: total time: 2572, number per second: 388802.4883
matches (^.* at beginning, .*$ at end) = true: total time: 2918, number per second: 342700.4798
find (^.* at beginning, .*$ at end) = true: total time: 2648, number per second: 377643.5045


regex = \s(19|20)\d\d([-/.])(0[1-9]|1[012])\2(0[1-9]|[12][0-9]|3[01])(\s+)(\S+)(\s)(\S.+)
input = <6>main: 2010-08-01 localhost foo23: load test 23 on tty1

matches = false: total time: 128, number per second: 7812500.0000
find = true: total time: 1199, number per second: 834028.3570
matches (.* at beginning and end) = true: total time: 2570, number per second: 389105.0584
find (.* at beginning and end) = true: total time: 2554, number per second: 391542.6782
matches (^.* at beginning, .*$ at end) = true: total time: 2630, number per second: 380228.1369
find (^.* at beginning, .*$ at end) = true: total time: 2595, number per second: 385356.4547


regex = foo0: .*load test (\S+) on ((pts\/\d+)|(tty\d+))
input = <6>main: 2010-08-19 localhost foo23: load test 23 on tty1

matches = false: total time: 87, number per second: 11494252.8736
find = false: total time: 193, number per second: 5181347.1503
matches (.* at beginning and end) = false: total time: 1242, number per second: 805152.9791
find (.* at beginning and end) = false: total time: 28631, number per second: 34927.1768
matches (^.* at beginning, .*$ at end) = false: total time: 1241, number per second: 805801.7728
find (^.* at beginning, .*$ at end) = false: total time: 1242, number per second: 805152.9791


regex = foo23: .*load test (\S+) on ((pts\/\d+)|(tty\d+))
input = <6>main: 2010-08-19 localhost foo23: load test 23 on tty1

matches = false: total time: 85, number per second: 11764705.8824
find = true: total time: 873, number per second: 1145475.3723
matches (.* at beginning and end) = true: total time: 1812, number per second: 551876.3797
find (.* at beginning and end) = true: total time: 1879, number per second: 532197.9776
matches (^.* at beginning, .*$ at end) = true: total time: 1874, number per second: 533617.9296
find (^.* at beginning, .*$ at end) = true: total time: 1865, number per second: 536193.0295


regex = 1997
input = <6>main: 2010-08-19 localhost foo23: load test 23 on tty1

matches = false: total time: 80, number per second: 12500000.0000
find = false: total time: 215, number per second: 4651162.7907
matches (.* at beginning and end) = false: total time: 1339, number per second: 746825.9895
find (.* at beginning and end) = false: total time: 37722, number per second: 26509.7291
matches (^.* at beginning, .*$ at end) = false: total time: 1350, number per second: 740740.7407
find (^.* at beginning, .*$ at end) = false: total time: 1351, number per second: 740192.4500

The moral of the story is, if you’re using Matcher.find(), use no anchors and no .*, but in all cases, you’ll get the most deterministic behavior from always anchoring your regular expressions properly.

New Blog

As if I don’t have enough blogs….  ;)

But I wanted to write about non-techie things, and I kept putting it off, because it felt kind of weird posting them to a blog that is obviously mostly about my tech adventures.  So, I’ve set up a new blog…

me.raccoonfink.com

If you feel like following it, go for it, if not, don’t. :)

Also, I’ve gone ahead and completely reworked my blog, and *cough* replaced it with WordPress, something I thought I’d never do. While WP has a somewhat sordid history and does require the upgrade train more often, it is easier to keep up-to-date, and appears to have a better track record more recently. I’d let the old blog software stagnate and found myself resisting messing with it more and more.

Let me know if you run into any issues. I think some old links will be busted, but google sitemap should pick up the new stuff pretty quickly, I hope.

Creating an iMix with Music from the iTunes Store

Sorry I’ve been a bit quiet lately, things have been crazy with work and I’ve only sporadically had time to update Fink stuff (incidentally, if you’re using any of my perl module packages, I updated about 100 of them this week.) I’ll be at WWDC next week if anyone wants to get together.

Anyways, as I’ve blogged about before, one of my hobbies is writing music, and I’ve been using TuneCore for all of my digital distribution to the iTunes Music Store, Amazon, etc. TuneCore has an awesome discussion list for artists using their service called the “TuneCouncil” that ranges from hobbyists like me up to producers and folks representing large and numerous big-name acts. It’s an amazing chance to level the playing field and have a real conversation between artists and others trying to find their way through the new music economy.

Recently, the subject of iMixes came up. An iMix is essentially a playlist or mix tape that you can upload to iTunes. The iMix will show up in the iTunes Store when you view the songs associated with that iMix, and people can rate them, etc. It’s a good way to find new music, based on things you already know you like. For an artist, it’s a great marketing tool, you can make playlists of music that complements your own, and get the word out. TuneCore has a tutorial on creating an iMix on their Marketing & Promotion page, but one thing it doesn’t mention is that as of iTunes 9.0, Apple has changed the interface and you can no longer put tracks you don’t currently have in your iTunes library into an iMix. Previously, you could drag songs directly from the iTunes store listing into a playlist, whether they were a part of your collection or not.

Thankfully, this is still a possibility if you downgrade iTunes to 8.2.1.

Removing iTunes

First, you’ll have to remove your existing copy of iTunes. Be careful deleting files. There is no warranty for my blog! If it breaks in half, you get to keep both halves! Also note, if you downgrade iTunes, you will have to delete or rename your existing iTunes directory (Home -> Music -> iTunes on Mac).

Windows

On Windows, you should be able to uninstall iTunes through the control panel.

Mac

On Mac OS X, drag iTunes from your Applications folder to the trash, and then drag the “iTunes” and/or “iTunesX” packages from the Library -> Receipts folder to the trash:

Install iTunes 8.2.1

8.2.1 was the last version that had an interface which allowed dragging tracks from the iTunes Music Store interface. You can download them here:

Create a Playlist

Now that you’ve got an old version of iTunes installed, you should be able to create a playlist (File -> New Playlist) and then go to the iTunes Store and search for your songs to add. You should be able to drag from the list on the right into your playlist:

Select your playlist on the left side, and you should see the little circle with an arrow appear next to the name. Click that, and you should have the option of creating an iMix:

That’s It!

For details and other useful marketing ideas, check the TuneCore marketing and promotion page, and the TuneCore blog, they’ve got lots of great pointers to other resources.

For A Good Cause, Shave Here

For the first time, I am participating in a St. Baldrick’s event for cancer research. It’s a great cause; I have family members and friends who are either battling with cancer, or are themselves survivors.

My goal is to reach $1000 in donations towards cancer research through the St. Baldrick’s Foundation. If there’s anything you can do to help, I would very much appreciate it, and there are many others out there who can benefit from your help.

Donate Here!

KDE4 Progress

I’ve been making good progress on getting KDE 4.4 (release candidates) working. It’s been quite an interesting ride, in both a good and bad way. =)

First, there’s the fun of 10.6 making it even harder to have code that forks without it accidentally exploding on the CoreFoundation fork-without-exec prohibition. I was able to solve this with a combination of fixes from macports’ kdelibs4, and some of my own code which changes things to use low-level POSIX APIs instead of Qt APIs for some bounds-checking before execution.

Next, there’s the fun of Phonon. KDE 4.4 requires a newer version of Phonon than what ships with Qt (even Qt 4.6). On OSX it gets even hinkier, since the QuickTime plugin for Phonon requires private Qt headers, so the only sane way to build it is to build the Phonon included with Qt, rather than building it as a separate project.

I ended up adapting a patch the Kubuntu folks use to inject a modern Phonon into Qt 4.6. In the process, I finally got around to learning my way around Git (and gitorious), and have set up my own Qt branch which includes my (binary incompatible outside of Fink) patch to Qt to fix plugin-building, Phonon from kdesupport, the kde-qt (formerly qt-copy) changes, and my patches to Qt that splits OSX into two platforms, Q_OS_DARWIN (i.e. use raw UNIX APIs, no Core*), and Q_OS_MAC (standard Qt/Mac).

Long story short, I’m getting there. I’ve gotten about half of KDE 4.4 RC1 built and apparently running reasonably. RC2 was just released to packagers, and I’m testing out my move to Qt 4.6.1 from 4.6.0, but once I get everything test-built on 10.6, I’ll go validate everything on 10.4 and 10.5 (including making some DBus fixes for 10.4).

After that, the next thing to tackle is Mono, and then eventually I’ll see if I can get KDE3 building/working on 10.6.

Fink and 10.6

It’s been a crazy couple of weeks, with Snow Leopard out, people are scrambling to fix packages that haven’t been already. I was a slacker in running the seeds this time around, and haven’t really had much chance to give my packages a serious look until recently, but FYI, I am working on getting everything building everywhere I can.

Some notes on popular stuff:

  • KDE3: There were a number of annoying things blocking KDE3, but with the approval of some of the other maintainers, I’ve got a lot of the deps that were failing fixed up, and I’m working my way through a full KDE build and hope to have everything hunky-dory in unstable in the next few days.
  • KDE4: First of all: there will not be KDE4 on x86_64 in the near future. Qt4/Mac 64-bit does not have the Qt3Support framework, which plenty of KDE4 bits still depend on. I’ll definitely be making sure that KDE4 builds fine in 32-bit mode, and in 64-bit X11 though, and after that, well, we’ll see how much work it is to excise Qt3Support from at least the base libraries. In the process, I’m going to try to update it to KDE 4.3.1.
  • Java packages: When I packaged a lot of Java stuff for 10.4 and 10.5, I tried to build them targeting the 1.4 JDK, so it was more likely that built jars would work for most people. Unfortunately, Snow Leopard removes the 1.4 JDK, so I’m updating everything to build with the 1.5 JDK. Most stuff is handled, I’ll be fixing up other stuff as I run into them.

If you have packages that you use day-to-day, let me know, I’ll try to get to them first. I’ve been fixing things up on a first-come, first-serve basis based on reports to my maintainer email address(es).

I’ll post here on my blog if I hit any other major milestones. In the meantime, happy Finking. :)

Getting My Feet Wet: The OpenNMS iPhone App

I’ve been spending some spare time working on an OpenNMS iPhone app, and things are coming along just great. As many of you know, I do a lot of work with porting various UNIX C/C++ applications to Mac OS X, but despite now having many years of practice doing such things, I actually have very little knowledge of writing C/C++ code from scratch.

I’ve debugged many a bad header, but up to this point I could count the number of lines of code I’ve actually written where I need to manage my own memory on erm… well, 20 hands? OK, bad analogy.

Still, it was with much trepidation that I approached finally hunkering down and learning Objective C. The verdict is: not bad. I did have to go through some growing pains learning how scoping and memory management works, but it’s not as troublesome as I’d feared — and the class libraries are pretty robust. In a couple of weeks, it’s nearly feature-complete for what I wanted to get working for a 1.0 release. All that’s left is the alarm detail page, and being able to acknowledge alarms from the app.

The biggest thing I learned was Instruments and the LLVM static analyzer are you friends! The Clang Static Analyzer is friggin’ awesome — it wraps your build and then analyzes the resultant binaries and outputs a report that tells you whether you’ve passed ref-counted data, allocated without deallocating, and other spiffy things.

I’m stopping to work on getting an OpenNMS 1.7.6 (and next week, 1.6.6) release out the door, but hopefully I’ll have a chance to pick it back up and finish it off soon. I’m still waiting for the OpenNMS corporate iPhone development paperwork to go through anyways.

Without further ado… screenshots:

Outages List
Node Detail (1)
Node Detail (2)
Node Detail (3)
Alarm List
Node Search
About

It’s open-source, so if you want to see my awful code, you can check it out from SourceForge.

KDE 4.2.4 Released to Fink Unstable

Just a note to say that I’ve released KDE 4.2.4 to Fink unstable. And now it’s time for the fun part: big bold red text telling you it breaks stuff.

KDE4/X11 Plasma Desktop on Mac OS X
KDE4/X11 Plasma Desktop on Mac OS X in Xephyr
 
Working KOffice file asociations
Working KOffice file asociations

Actually, that was just the text saying that I was going to have big bold red text telling you it breaks stuff. Here’s the real thing:

It breaks stuff!

But let me explain: it makes things better! Because of some esoteric stuff relating to case-sensitivity, existing packages, and bugs in Fink dpkg, there were issues on a number of people’s systems with the existing KDE packages and conflicting paths. Of course, the root of the issue is that Fink didn’t have a proper “/opt” type directory, so a number of packages for quite some time have been using “/sw/lib” for that purpose (/sw/lib/qt4-x11, /sw/lib/flex, etc.)

Since I was going to have to move things around anyways to fix this issue, I decided to do it right. As of Fink 0.29.7, the package validator accepts “/sw/opt” as a valid path to root packages. All of the KDE4 packages have been changed to use this new path, so when you upgrade to KDE 4.2.4, you will end up with a nice fresh clean KDE in /sw/opt/kde4/x11 or /sw/opt/kde4/mac (or both).

But wait, there’s more!

I’ve also spent a lot of time fixing bugs and tweaking some fink-specific behaviors so that KDE integrates better with your Fink experience.

  • Fink’s kdelibs4 automatically knows about the usual locations for kde4 files, so all KDE4 apps will start properly without needing /sw/opt/kde4/{x11,mac} in the path. This includes KDE4 apps launched from the Finder.
  • The kdebase-workspace package is now supported for KDE4/X11. That means you can start a full KDE desktop!
  • As a test, I created proper Info.plist files for KOffice, so file associations actually work. Till Adam has been working on a more robust way of doing this in the future, but if I find the time I might work on setting up more associations for common KDE apps in the mean time. (Kommon apps?)

So, for those of you who have already tinkered with KDE4 in Fink, I’m sorry to say it all needs an upgrade. But, on the bright side, once you do, you’ll have a much nicer KDE.

As always, if you run into any issues, please let me know.

The Open Source Philosophy (Continued)

The conversation has continued over at the 451 CAOS Theory blog. In response to my musings on intent, David Dennis asked a great question:

Benjamin,

A question for you (and Tarus). Is this topic important to you because:

  1. You believe it’s an important marketing differentiator for the software you work on vs. competitors
  2. You believe it’s an important philosophical / moral issue worth evangelizing
  3. both

Tackling a) involves traditional marketing objectives around branding, awareness, messaging, positioning, etc. Not necessarily a cake walk, but certainly possible to make progress.

Tackling b) involves changing the way people think and behave, which is much much more challenging.

My response is:

Definitely both.

From a personal point of view, I’ve been involved in open source software since before the phrase was coined, so I do feel that it is at least a personal philosophical issue. BUT I’m also a pragmatist, and I know that arguing purely for philosophy’s sake will not convince anyone.

That said, that philosophy drives me to support the companies that I think are doing it “right.” I work for OpenNMS not just because I think the software’s great, but also because I love that we can compete with “the big guys” by having a better community.

Part of the reason that we get so passionate about it is that a lot of these “does it really matter?” conversations start with the implication that we’re already failing to compete, which is just plain wrong. I’m sure that’s why we probably often sound defensive when we are hoping to sound convincing. ;)

I’m the first to roll my eyes at the “true believers” — while I think that in the end open source is a better way to do things in a “pay it forward” kind of way, I believe it’s better from a philosophical and pragmatic way. It won’t work for everyone, but it can work for open source projects as an alternative to big-money funding.

I am a child of the VC tech industry. I’ve worked at startups and I know what it feels like to work on software you think is great only to be shut down and have the IP sold off, just because the VP of sales didn’t do his job. It’s refreshing to work for a company that starts with community first, and grows by being truly profitable, rather than by incurring massive amounts of debt. (See: current economy.) It’s refreshing to not be one of 10 companies the VC bets on, and if 9 of them fail, “eh, oh well, that’s statistics.”

Since we grow as we have profit, rather than funding, the biggest investment we can make is in our time, improving the software, and growing the community. There is nothing wrong with the “fauxpen source” companies’ business model, they are welcome to write good software as best they can, and get market share, but in the end, we do differentiate by our openness and our interaction with the community. When they co-opt the phrase that was meant to be equivalent to “free software” to now mean “kind of free software,” it does pure open source companies a disservice and it is a lie by omission that they equate their software to be “just as free” as ours.

Sure, that’s competition, but that’s why it’s important for us to get the word out that there is a difference.

The Open Source Philosophy

There has been a lot of discussion recently on the Open Source Definition, and the use (and abuse) of the term “Open Source.” One of the things that has been missing from this discussion is a higher-level overview of where the friction between “open source” and so-called “fauxpen source” comes from: intent.

The Open Source Definition arose out of the ambiguity of the word “free” in “Free Software,” as defined by the Free Software Foundation.” In the English language, “free” is a loaded term that has two meanings: “freedom”, and “costing nothing.” It was created to get rid of some of the emotional baggage that came with the intense philosophical point of view of the FSF, but just because the OSD is more “business-friendly” does not mean that it doesn’t have the philosophy and intent of openness behind it.

This friction comes from two very different approaches to open source that I think have been missing from a lot of the discussion regarding how open source applies to business models. I’m going to call these “community value” and “monetary value.”

In some ways, this dichotomy reminds me of the GPL (GNU Public License) versus the LGPL (Lesser GNU Public License). The GPL is a pure open-source license, which guarantees the user’s freedom by making it so that no software that uses GPL software can hide or restrict that use in a derivative work. The LGPL, on the other hand, was a compromise, a pragmatic license which allows one to create free software, but it does not require free distribution of things that link to that software. The LGPL has always clearly been discouraged by the FSF precisely because it compromises the freedoms guaranteed by the GPL.

Open Core: Nurturing Monetary Value (“Lesser” Open Source)

Advocates of using the term “open source” to apply to open-core and similar business models approach open source from a monetary value point of view. It is an approach of pragmatism: you create a business plan, get venture capital to get going, and sell software licenses to (hopefully) eventually pay back the VC firms and continue to grow. In many ways, an open-core business is exactly the same as any other startup. Open Source is not a fundamental philosophical part of the business, it is instead used to cut costs, and to help grow “buzz” about what your company is doing, and perhaps even get some free QA, bug reports, etc. The focus is not on creating a community to draw customers indirectly by improving the product, but to draw customers directly by creating awareness. To meet the demands of the venture capital, it requires a fast-growth, high-yield business model, and the community model doesn’t grow that fast.

In the end, how much work you do nurturing a community is directly a matter of how many you can convert to paying year-over-year for software licenses, or whatever other artificial scarcity you create. Once customers have decided they want the features only available in the up-sell (“enterprise”) version of your software, they have more resistance to changing to another product.

This is growing monetary value. The open-source community in a “monetary value” business is a side-effect of a marketing push to draw licensed customers. If the community goes away, you still have an essentially standalone commercial business that can continue just as if the community never existed. This is not to say the community doesn’t provide value, nor that it doesn’t derive value from the open source portion of their software, only that the business plan itself doesn’t hinge on the openness of the software.

Open Source: Nurturing Community Value (“Pure” Open Source)

On the other hand, “pure” open source business models (services & support, like my employer) approach business from a community value point of view. This is not to say that pure open source companies don’t want to make money — only that to be successful, we compete with much bigger companies by multiplying our value with that of the community. To be able to be competitive with companies with huge amounts of seed money, we can’t afford to treat our community just as a resource to be mined; our community is what makes it possible to support a large user-base with a small number of employees. Our community are like-minded individuals, working on this project with us together.

Since we are not beholden to venture capital, we don’t have to get quick returns to maximize stockholder returns. Instead what we need is to work with the community to make great software, and to continue to challenge ourselves to extend and expand our knowledge of that software, so we are able to provide our expert opinions as a service worth paying for. As long as we can pay the bills, give ourselves a comfortable salary, keep customers and the user-base happy, and grow the business, we consider ourselves successful. Our goal is to become the de facto network management platform, and we can do that better by not being in debt to venture firms for years.

On the surface, the value we and our community get from each other may not look that different from that of an open-core company, but looking deeper, there are a number of advantages to the “slow and steady wins the race” methodology we use:

Everyone Gets the “Enterprise” Version

You are not held ransom for features that are only in the for-pay version. You can evaluate the product as it truly is, without time-limited evaluation licenses, crippleware with features missing, or annoying shareware reminders.

No Software Licenses

The software isn’t artificially limited, it is capable of whatever your hardware is capable of without an arbitrary restriction because the sales VP decided that’s where to draw the line. Note that it’s easy to think that Red Hat is a counter-example, since they charge licenses for installed hosts, but from a freedom perspective, you can install the code on as many hosts as you like (ie, CentOS) without restriction, the restriction is only on “official” support.

Everyone Benefits from Community Involvement

When the community adds value, everyone benefits. Users help each other with issues, provide patches, documentation, and so on. The community contribution to OpenNMS has been huge — not only major features, but default configuration for large amounts of network gear, which OpenNMS now supports out of the box on every new install.

Everyone Benefits from Commercial Involvement

Since the code is 100% open-source, customers who pay for custom development not only get their own value from the transaction, but the software goes back into the mainline, and benefits everyone.

An anecdote: We had a support customer who paid for some custom development for a somewhat esoteric feature. It was originally done in a branch of OpenNMS something like 5 or 6 years ago. The OpenNMS code base has gone through huge upheavals, refactorings, and architectural changes since that feature was created, but since it was released back into the OpenNMS mainline, when they finally upgraded their production system to an up-to-date OpenNMS release, the feature continued to work. If a consulting company did that same custom work as an HP OpenView plugin, they would have to port/implement it all over again after 3 major version revisions of the upstream software.

100% Forkable

While we make a living supporting OpenNMS, we do not “own” the project. (Although, we do own the OpenNMS trademark — the realities of business in the US require it if we want to be able to protect the name of the project.)

At best we are stewards, but the list of OpenNMS employees is a fraction of the number of “core” contributors, and the core contributors are a fraction of the number of users who have added value to OpenNMS over the years.

We run and guide the project, but only because the community trusts us to do so. Our job is to earn that trust, by upholding the principles I’ve outlined above. When we do so, everyone gets value from OpenNMS. If we fail to do so, the OGP will “vote us off the island,” and if it becomes bad enough, the source is fully available so it can be forked to something the community approves of.

Either Approach is “Better,” but Only One is Truly Open

In the end, it’s like the difference between a stable community bank who personally knows every person it gives a loan to, and an investment bank bringing in money fast with high-risk derivatives; they are focused on providing value to two entirely different sets of people. Either approach is better depending on your point of view — they each have their advantages. However, in the end, I believe it is disingenuous to claim that open-core and similar business models have as much right to the phrase “open source” as pure open source businesses.