Category Archives: Software

Noteworthy software, apps, and tools.

Time Zones

If you like rabbit holes, a wonderful way to spend your Sunday is in the writing of Zach Holman, an early engineer at Github and Gitlab.

All are good, but a particular favorite of mine is UTC is enough for everyone …right? You don’t need to code to appreciate that time is a construct, that has evolved over time. “At noon in DC, it was 12:08 in Philly.” Time zones introduce particular complexity because, besides obvious things like Daylight Saving Time starting and stopping at different times at different places in history and geography. If you do write code, you’ve probably come across things like Epoch Time.

The Unix epoch (or Unix time or POSIX time or Unix timestamp) is the number of seconds that have elapsed since January 1, 1970 (midnight UTC/GMT), not counting leap seconds (in ISO 8601: 1970-01-01T00:00:00Z). Literally speaking, the epoch is Unix time 0 (midnight 1/1/1970), but ‘epoch’ is often used as a synonym for Unix time. Some systems store epoch dates as a signed 32-bit integer, which might cause problems on January 19, 2038 (known as the Year 2038 problem or Y2038).

I’ve spent far too many hours on the PHP date manual page and the related comments (now gone! I used to have a few, they probably retired because they were on earlier versions of the language).

As a bit of lore for Zach he might appreciate, I’ll share that when writing some of the first logging and data processing systems for Akismet, I divided the files using Swatch Internet Time to give me a consistent balance of dividing a day, but still doing things as real-time as possible. The anti-spam learning system would update about every 86 seconds.

Journalism and Newspack

WordPress.com is partnering with Google and news industry leaders on a new platform for small- and medium-sized publishers, called Newspack. The team has raised $2.4 million in first-year funding from the Google News Initiative, Lenfest Journalism Institute, Civil funder ConsenSys, and the Knight Foundation, among others. We’re also still happy to talk to and engage other funders who want to get involved — I’d love to put even more resources into this.

It’s been a difficult climate for the news business, particularly at the local level. It also breaks my heart how much of their limited resources these organizations still sink into closed-source or dead-end technology. Open source is clearly the future, and if we do this right Newspack can be the technology choice that lasts with them through the decades, and hopefully our 15 years of growth lends some credibility to our orientation to build things for the long term.

Here’s Kinsey in Nieman Lab:

The goal is to both make sure that the catalog of publishing tools as well as business tools they need to be able to run what one hopes is a sustainable news operation are addressed simultaneously. It’s not simply a CMS for a newsroom, but a full business system that enables publishing and monetization at the same time.

Nieman Lab interview

As you have come to expect from Automattic, everything will be open source and developed to the same standards WordPress itself is. We’re working with Spirited Media and the News Revenue Hub on the platform, and we will likely look for even more partnership opportunities from across the WordPress ecosystem. If you’d like to invest or get involved, drop us a line at newspack@automattic.com.

Programming and Writing

I really enjoyed this quote from Brent Simmons in an interview with John Gruber.

I’ve always thought of it this way: a good writer reads a lot of books. They see how other writers solve problems. They pay attention to what’s happening now as much as they pay attention to the classics. Good writers are readers first, but eagle-eyed, careful readers.

I think good developers are the same: they look at other apps. They “read” those apps, the problems they have and how they solve them. They notice trends, they notice new solutions, they notice when things work and when they don’t.

 

It reminds me of some passages from a book I’m reading right now, Bird by Bird by Anne Lamott:

Bird by BirdHowever, in the meantime, we are going to concentrate on writing itself, on how to become a better writer, because, for one thing, becoming a better writer is going to help you become a better reader, and that is the real payoff. […]

Writing can give you what having a baby can give you: it can get you to start paying attention, can help you soften, can wake you up. […]

Because for some of us, books are as important as almost anything else on earth. What a miracle it is that out of these small, flat, rigid squares of paper unfolds world after world after world, worlds that sing to you, comfort and quiet or excite you. Books help us understand who we are and how we are to behave. they show us what community and friendship mean; they show us how to live and die. They are full of all the things that you don’t get in real life — wonderful, lyrical language, for instance, right off the bat. And quality of attention: we may notice amazing details during the course of a day but we rarely let ourselves stop and really pay attention. An author makes you notice, makes you pay attention, and this is a great gift. My gratitude for good writing is unbounded; I’m grateful for it the way I’m grateful for the ocean.

That’s how I feel about software.

Rails Bashing

Since 7 reasons I switched back to PHP there seems to be a trend of Rails-bashing articles, epitomized by this one which is a fine example of the form until it advocates ASP.NET. Through it all, I still haven’t heard of a startup or web service that failed or succeeded due solely to its web framework or language. These articles are like the celebrity gossip stories of Web 2.0, complete with ad hominem attacks, and just as useless. Hacker News tends to be a fairly high signal source of discussions actually relevant to startups.

On PHP

PHP.net has announced that they will stop development of PHP4 at the end of this year, and end security updates on 2008-08. (In 2007, their site still doesn’t have obvious permalinks. They do have a RSS 1.0 feed though, remember those?)

PHP 4.0 was release in May of 2000, by 2004 when the first version of PHP 5.0 was released, PHP 4 had achieved complete dominance and was completely ubiquitous in both script and hosting support.

Fast forward 3 more years and PHP 5 has been, from an adoption point of view, a complete flop. Most estimates place it in the single-digit percentages or at best the low teens, mostly gassed by marginal frameworks. Even hosted PHP-powered services who have no shared host compatibility concerns like 30boxes, Digg, Flickr, and WordPress.com, have been slow to move and when they do it will probably be because of speed or security, not features.

Some app makers felt sorry for PHP 5 and decided to create the world’s ugliest advocacy site and turn their apps in to protest pieces at the expense of their users. (Hat tip: Mark J.) They say “Web hosts cannot upgrade their servers to PHP 5 without making it impossible for their users to run PHP 4-targeted web apps” ignoring the fact that there isn’t a released PHP app today that isn’t PHP 5-compatible and recent upgrade issues have been caused by PHP itself in point releases. (See WP#3354.) It’s easy to always promote the newest thing, but why, and is it for us or our users?

Now the PHP core team seems to have decided that the boost their failing product needs is to kill off their successful one instead of asking the hard questions: What was it that made PHP 4 so successful? What are we doing to emphasize those strengths? Why wasn’t PHP 5 compelling to that same audience? Are the things we’re doing in PHP 6 crucial to our core audience or simply “good” language problems to solve? Will they drive adoption? How can we avoid releasing (another) PCjr?

I wonder if PHP 5+ should be called something other than PHP. A unique name would have allowed the effort to stand on its own, and not imply something that’s an upgrade from what came before when in many cases it’s just different, not better, from an end-user perspective. Continue to maintain PHP 4 as like a PHP-lite. Make it harder, better, faster, stronger.

For all the noise though, this isn’t a big deal. It’s easy to forget that PHP 4 hasn’t had any real innovation in the past 3 years while at the same time apps and services built on top of it have created some of the richest and most compelling user experiences the web has seen. (Née Web 2.0.) None of the most requested features for WordPress would be any easier (or harder) if they were written for PHP 4 or 5 or Python. They’d just be different. The hard part usually has little to do with the underlying server-side language.

Someday on our mailing lists I hope half the words wasted pontificating on “language version wars,” which are even duller than language wars, go toward design, copywriting, information, performance — the things that truly matter.

OpenDNS

OpenDNS is a great idea, well-executed. They took something basic and ubiquitous, DNS, and improved it by adding spell-checking and phishing protection (usability enhancements). They provide the service for free in exchange for monetizing typo search pages. The typo search pages are simple, fast, and generally useful. What I was looking for is usually the first result. There is no software to install, just two settings to change, and they provide a registration-free way to set preferences on their site. John Roberts is a friend from my CNET days and gave me a preview a few days before they launched, I've been using it full-time ever since and it has been invisible in all the right ways.

Zeldman Switches

I can now tell my kids about the day the inimitable Jeffrey Zeldman moved from almost 11 years of hand-coding to use WordPress. He wrote a bit about his thinking in Why WordPress? I’m about to walk out the door to go to Austin for SxSW, which last year was amazing and I thought it couldn’t get any better. When I started WordPress I had a one or two people in mind that in my wildest dreams would someday use the software, and that drove much of the development. Zeldman has switched, and I couldn’t be more honored. Now there’s even more work to do.

Hack WP for NY Times

I was at the New York Times last week and one of the things that blew me away were how many WordPress blogs they’re running. I had seen 2-3, but it turns out they have almost 25 running on both sides of the paywall and they’re going to be doing even more with WordPress as time goes on. (There are also some interesting things going on with Times Select.) There’s more good news: they’re hiring. Khoi Vinh writes “A thorough knowledge of weblog publishing software, especially WordPress, is required.” Sounds like a great gig for whoever gets it. Bringing things full circle, Khoi’s site is one of my favorites and an inspriation for the new WordPress.org.

Search Engine Markshowdown

I decided to run the web page analyzer (excellent tool) against the front pages of a few of the latest and greatest search engines and also do a little analysis of my own. Here are some of the results in one of the only tables you’ll ever see on this site:

  Feedster Technorati Google Yahoo Search
HTML 6.11 3.72 1.18 7.82
Ext. CSS 11.47 11.63 0 1.45
Other 9.10 6.70 15.10 1.72
Total 26.70 22.05 16.27 11.00
Compressed No No Yes No

Numbers are kilobytes, and may not add up exactly due to rounding. CSS is external, linked files. “Other” includes images and javascript.

Yahoo was the surprise winner here. Their HTML was alright but I think could be reduced quite a bit without losing anything. You’ll note they have the heaviest HTML of the bunch, heavier than other sites showing quite a bit more on their front page. They should probably talk to Doug. Overall though I think Yahoo has consistently been doing great nearly-standards-compliant work in their new designs. Yahoo could save about 67% of their HTML size with compression. Interestingly, Yahoo was the only site to specify ISO-8859-1 encoding, all the others claimed UTF-8.

Google was optimized to the hilt, but it’s kind of silly that they put so much effort into their markup but couldn’t go the last inch and make it valid HTML 4. They could probably make it a bit smaller with some more intelligent CSS usage. At least they don’t have font tags anymore. I think under normal circumstances they would have won but they have an olympic logo right now that’s pretty heavy. Google was the only site that used gzip compression for their HTML, but even uncompressed they only weighed in at about 2.4 kilobytes, still the lightest of the group.

Technorati clearly had the smartest markup of the group, and was the only one that validated. (An impressive feat for any website in this day and age.) Their markup is clean as a whistle with excellent structure and logic, and their numbers aren’t bad when you consider that they have a lot of stuff on their front page. This isn’t too surprising since Tantek did it. Their CSS, however, is pretty heavy. It’s strange because it’s very optimized in some ways but bloated in others, I think they could cut a few K from it pretty easily. One smart thing they did is have the CSS named with the date, so it’s name versioned and they can update it monthly without caching issues. All that said, they’re so far ahead of everything else they don’t need to worry about much. Technorati could save about 53% of their XHTML size with compression.

Feedster has its heart in the right place, but the implementation falls far short. For example it has a XHTML 1.1 doctype but then has the needless XML declaration at the top throwing IE into quirks mode. They use CSS in places, but then they have a table with 75 non-breaking spaces in it for positioning. There’s a ton needless markup, including a full kilobyte of HTML comments. On the bright side, they have the most room to improve. Feedster could save about 61% of their XHTML size with compression.

WordPress.org Search

I’ve ripped out the guts and redone the search on the WordPress.org support forums in the hopes of making it something more people will use. Try it out! The new system searches the wiki (hosted on a different machine), thread titles, recent posts, and does a FULLTEXT post search for the most relevant posts. It has contextual search highlighting (like Google).

When I have some time to get back to this every section will have a “more of this” link to take you to more results (paged). It does this currently with the wiki search, counting the total results and linking to the wiki search directly if there are more than 5 results. Probably still a few bugs to work out. The fulltext query was taking over two seconds to run until I tweaked the JOIN type to get the MySQL optimizer to use the proper index and join order. Everything should validate as XHTML.

A new system is also in place to inject custom results at the top of the page. We’ve been logging searches for the last few months (over a 129,000 so far, about 43,000 unique searches) and I’m going to be working closely with the documentation team to identify which searches are most common and what tailored information would be best to present the user with when they search for targetted terms, be it a blog post, an external resource, someplace on WordPress.org itself, a wiki page, or a specific thread. We can watch trends and spikes in searches to identify any problems in the application itself or features that may be insufficently documented or hard to use.

The work is far from finished, but I think it’s a strong first step into fully integrating search as a support mechanism and bringing the WordPress team even closer to the pulse of the users.

Weeds in the Garden

Under the Iron has an old interview with Scott Johnson that is a good read. Now scroll down to the comments. Dozens and dozens of spam comments. I see this over and over again on MT and s9y sites. What’s terrible is these pages are just as dangerous as dedicated spam blogs. Think about it: I shouldn’t even be linking to it now.

Alex told me the other day about a new type of comment spam he’s been seeing: comments that link to normal blog entries. Well known blogs like Mozillazine. As advanced as tools like MT Blacklist have become, they’re pretty useless in cases like this. Are you going to blacklist Dave Sifry? Molly.com used to have spam comments on her site all the time. Even though she spent a lot of time and effort dealing with them (a daily chore) they only need to be there long enough for Googlebot to index them for the harm to be done. I’m not dogging on MT here, it’s just that there are tens of thousands of MT blogs out there who don’t have any protection and the spammers are targetting them mercilessly. Domain blacklists don’t scale (spammers can have thousands of domains easily and hijack innocent domains) and centralized registration hasn’t shown to be effective except against people who don’t like centralized registration, a group that doesn’t include spammers.

People used to say that WordPress doesn’t get spam comments because it’s not popular enough. I don’t think this argument holds water anymore. It’s true that MT has three to four times as many blogs as WordPress, but Serendipity has an order of magnitude fewer blogs than WP and is highly targetted by spammers. I think WordPress has, through design and luck, done a lot of things right with regards to comment management in general. First we respond to the problem in the core code quickly. Moderation and blacklisting has been in the core for half a year now. All of the WordPress developers are bloggers as well so we’re pretty sensitive to new techniques in use by the spammers. When early versions of WordPress 1.0 advertised moderation was on spammers instantly adapted to that and started searching for blogs that didn’t have the phrases we used, so in the next nightly build for testers I had changed how that worked so it couldn’t be targeted anymore. Then in 1.2 we expanded the already successful moderation to allow powerful regular expressions and target not just the content but things like number of links in a post. Let’s say that somehow two hundred spam comments did get on your blog, which would never happen in the first place because we’ve had throttling for over a year now, you can easily delete hundreds of spam comments at once in under five clicks. We’re not sitting still either, version 1.3 will have emergent registration based on code originally written by Kitten so there is a type of automatic whitelisting going on that spammers can’t duplicate because it uses email addresses like a secret key and WordPress never reveals your email address. (So Dave and Mark, stop leaving fake ones!) The code will be flexible enough to adapt for GPG signing for the ultra-geeky in the audience.

Any of these things by themself wouldn’t be very effective, and each method I’ve listed has its flaws and weaknesses and I know them. Which brings us to what I think the real reason WordPress, despite its explosion of popularity, still doesn’t get the level of spam other tools do: it’s more trouble than it’s worth. WordPress, to spammers, is an unpredictable and moving target. We’re not resting on our laurels, we have another exciting feature-filled release coming just a few months after the landmark version 1.2. The WordPress moderation system can be be toggled to manual mode, which is 100% effective at catching spam, or triggered only when something is suspicious. We’re committed to keeping the cost high and the reward uncertain for spammers which means you don’t have to wake up every morning to filth on your weblog as well as in your inbox. You can focus on what draws us all to this medium, writing and genuine interaction. Here’s a quote from Molly from a comment she left on Keith’s site:

I wanted open comments. In my situation, MT, despite the wonderful Jay Allen personallyhelping me on an almost daily basis to deal with comment spam, I was a major target. My ISP refused to continue dealing with me because the server molly.com resided on was brought to its knees twice due to spam floods. I was spending up to two hours PER DAY to undo the spam much less post.

Since switching to WP, I’ve had exactly five emails sent to me automagically for moderation. 3 of them were spam, 2 were just enthusiastic posts with multiple links from a reader.

Either way, I had instantaneous access to accept or delete those posts.

That’s the sort of thing that is incredibly rewarding about working on WordPress. Knowing that your work makes it easy for someone else to do what they love is one of the greatest feelings in the world. No amount of money or recognition can ever match that.