Archive for the 'Programming' Category

Gum - a free software media distribution system

From the start of this year, in my non-work spare time I mostly stayed away from that other project (GIMP) and have been designing and writing a GTK+ application in C for distribution of high definition video, audio and documents over the internet. There is a big opportunity in the media space, and many apps are structured towards commercial content with DRM. We need something in our community backyard, and so Gum is a media distribution system which doesn’t implement DRM and works around the need for piracy. But this is not the only difference.

The following are going to be Gum’s 1.0 features, and I will continue to work on it in my free time (but you can help me speed this up—see further below):

  • Support for playback of high-definition full-length video, audio and documents (similar to Acrobat Reader / Evince).
  • Support for multiple audio-streams for different languages, or things like director’s commentary (video, audio)
  • Support for multiple video angles (video)
  • Support for chapters (video, audiobooks, real books)
  • Support for subtitles (video, audio - lyrics and text, documents - text and annotations)
  • Multi-channel audio and high-definition video (1080p)
  • Full UTF-8 i18n and localized in various languages, with country-specific cultural focus
  • There is no DRM. There will be support for exporting these media into portable files for use in other applications, apart from support for portable devices such as iPod and eBook readers.
  • We will start with existing content from various publishers such as Librivox, Archive.org, Magnatune, PLoS, etc., and will add an upload facility later.
  • P2P system for handling high traffic loads (details further below).
  • Gum is and will remain a free software application licensed under the GNU GPL.
  • Ad-free Creative Commons, public-domain and other freely licensed content.
  • We will have commercial content with advertisements. This will mean convincing publishers such as O’Reilly, Hollywood and Bollywood to publish on Gum. Ads will pay for the plays just like on television, and hopefully viewers won’t have to pay for content. This could kill the need for many forms of digital copying and re-distribution called ‘piracy’, as everyone now has free and convenient on-demand access to content. We have a reasonably nice implementation to combine the use of P2P, advertisements and the ability to export to files.
  • We will also likely have support for purchasing digital content without ads (and without DRM).
  • Banu’s website will contain the web-application, with categorized browsing, metadata, previews, accounts, bookmarks, channel subscriptions, comments, ratings, charts, suggestions, etc.

These are a LOT of items for a single person to do quickly for 3 types of media (video, audio and documents). But everything in the list above is do-able now. There are excellent free software libraries to implement all of the functionality such as libc, GTK+, poppler, GStreamer and libavcodec (ffmpeg), GNet, libxml2, Cairo, Tango icons, etc. and it can all be ported to multiple platforms. I know how to implement all of the GTK+ application and web application (and you have some of the proof below and on Banu’s website). Let’s make Gum the standard free software system for digital media distribution, free of DRM issues and other blues. So what needs to be done? I’m posting an article about it here as I need help and any advice you may want to share. Here is what the program currently looks like. It is a GTK+ app, so it’s themed according to the default theme. It can be themed to look different too.

So far a lot of the main multi-threaded app including UI, updates checking including XML schema changes, and XML HTTP communication between server and client are implemented. It is written in C using GObject and uses GTK+/GIMP’s coding style. The Banu website framework with user accounts management is also ready (the public site is an old branch). Currently I am working on the backend database.

Gum search page
Gum video page
Gum documents page
Gum transfers page
Gum preferences dialog
Gum about dialog
Gum in terminal


Help!
The reason I’m posting this here prematurely is to ask for help.
I want to work full-time on Gum, and if possible, have one or more co-developers to speed up its development. Banu will also require hosting for the web application and content (machines, disk space, bandwidth, we’ll take care of the distribution software).

With bambi eyes, I am looking towards a free software company such as Red Hat / Sun / Novell / Nokia / etc. (any employees want to take this up?), or even any media houses such as Sony, Warner Bros, Paramount/Viacom, or rich individuals who want to bring about DRM-free media to help me by funding salaries and providing hosting, if they think that such an application will be beneficial to the CC / free software community and the general public. I can’t offer a percentage of Banu as collateral for this. The reason is simple. There’s a mission, a plan to do good with Banu and Gum, and at this point I don’t want to be influenced away from it. There’s a solid plan and it just remains to be acheived. If Gum gets funded, and generates commercial revenue eventually, I will thank them appropriately *Grin*. How much money can a company lose by paying a salary to just one or two more developers to do good things? Come on, please try and help!

About me, I have an MS degree in computer science. You can go through my resume too. If you can help me, please contact me. I’m also open to advice and criticism (if you’re reading this blog somewhere, I’m a member of that community and you can tell me).

Why yet another iTunes-like GTK+ app? Or why not extend an existing app?
Sorry if I have offended anyone by this, but although Gum seems similar to apps like Rhythmbox and Banshee, it isn’t. It has to support multiple media types, do P2P (not straightforward because we are going to have ads in here) and has plenty of web-service integration. It’s perhaps possible to start with one of these apps, but it’ll likely be more work than starting from scratch, especially to a developer who doesn’t already understand these projects. So the code was started from scratch.

What P2P system does this use?
The app is being written to use a P2P method similar to Bittorrent to handle media distribution under loads. It isn’t exactly Bittorrent due to implementation differences, but the protocol structure is very similar and it will have that system’s benefits. I am also looking at integrating UDP NAT hole punching with the tracker’s help and STUNT so that connections can happen through firewalls without inconveniencing users with manual configuration.

What media codecs can this system support?
For video/audio, anything GStreamer and libavcodec (ffmpeg) supports, including 1080p H.264, MPEG-2 video, Theora, MP3, 5.1 channel AC3, Ogg Vorbis, AAC, etc. For documents, PDF and DjVu.

How is this different from Democracy Player, Bittorrent, Azureus, iTunes, Joost, Google’s plans, etc.?
Each of these apps has a slightly different feature set, and Gum has its own way of doing things: Gum is specifically for non-streaming pre-download high-quality content, with the ability to view items free of cost with ads. There is no DRM. Gum provides 3 media types (video, audio, documents) and uses P2P.

How can I keep an eye on what’s happening? How can I help?
I’ll post updates on this blog (syndicated wherever you’re reading this). I’ll publish Gum’s source code once the backend and app code have stabilized a bit, and request gnome.org project support at that point (if you know me, ask me for access).

I request you to please tell everyone you know or publish on your blog that I’m requesting help to allow me to work full-time on Gum and pay for/host Banu’s servers, and perhaps bring on one or more co-developers so that this can be sped up.

Auto-clipping after transformations in GIMP

Following my last post on this subject where only the raw algorithm for finding the largest rectangle was completed, the UI and PDB changes for using this functionality are also done now (and several edge cases were fixed in the process). I’ll commit it once it gets some external testing from some of our esteemed customers ;) . All affine transformations now present a combo box with clipping modes:

GIMP autoclip feature

Here is an example with rotation, using the excellent Wilber splash draft by Paul Davey and Jimmac (still no window decoration in the screenshots):

Wilber before rotation
Wilber during rotation
Wilber after rotation and crop-to-result

Rockin’ in the free world

Sun, you’ve rocked my day today by freeing the Java development kit as OpenJDK. Congratulations to all the people at Sun and outside who have lobbied hard for it and didn’t give up hope.

Java Duke rocking in the free world

Smooth smoothing :)

Following up on the last post, we now have smoother loglog smoothing, for the Fractal Explorer plug-in of GIMP. Mmmm. Will be committed after some cleanups of the plug-in code though.

Classic smoothing (or lack of it):
Mandelbrot set without smoothing

loglog smoothing:
Mandelbrot set with loglog smoothing

On finding bugs

I keep saying that cooking, unless one’s doing it for recreation, is crazy as one spends way too much time cooking and cleaning up than eating the result. Fixing bugs is something like this, but worse. It’s probably someone else’s code who wrote it back in 1999 and somehow the bug has been there all along undiscovered; you know where the bug is; you probably even know how to quick-fix it; but to really fix the bug knowing what you’re doing means having to understand the code. Take a graphics application and you spend more time following the code. You spend a few hours doing that, and then the patch to fix it is 6 lines long, and it’s something very very simple. You put an if somewhere to handle a condition.

Last week, the Fractal Explorer plug-in of GIMP had a bug like this. The code for handling the loglog checkbox was added back in 1999. It has never worked on the complete classic Mandelbrot set (or if it ever did work, that must have been due to a bug in some other product). Looks like nobody used it like that. Kevin Cozens was porting the plug-in to GEGL when he discovered that it crashed with the loglog setting (#372671).

GIMP's Fractal Explorer plug-in

For a Mandelbrot set, the classic method of rendering the set is to map the escape loop counter (once the modulus has escaped) directly to a colormap. This counter is an integer and the method causes banding due to non-smooth shading. The loglog implementation uses a method by Linas Vepstas to renormalize the escape loop counter by finding out how far away the integer counter has strayed outside the escape radius. It involves calculating a double log to map z^(2^i) to i, i.e., log (log (|z|^(2^i))) / log (2). The final divide by log (2) is so that the outer logarithm is done to the base 2 to reduce the 2^i to i. For reference, the end-result normalized loop counter is calculated as: mu = N - log (log (|Z(N)|)) / log (2) where N is the integer escape counter.

The problem with this method was that in the “inner lake” of the Mandelbrot set, where for example c can be 0 + 0i, the loop always exits after a full run with the loop counter at the maximum value, and |z| < 2.0. When |z| < e, the double log starts returning negative values. When |z| < 1, the double log returns inf. This is then subtracted from N and the resulting value mu is used to lookup RGB values for the point from the colormap array. This caused the plug-in to crash (when inf is used or mu exceeds the length of the colormap array).

So what we did to fix this was check to see if |z| < e at the end of the escape loop and clamp mu to N - 1 in that case. The result:

Classic smoothing (or lack of it):
Mandelbrot set without smoothing

loglog smoothing:
Mandelbrot set with loglog smoothing

Open them in two tabs to see how they differ. Of course, as we still use entries directly from a colormap (after rounding), it isn’t as smooth as we’d really like, but interpolation using the real values should be an exercise for another day. Now fixing the actual bug was a 2-line if condition. Imagine hours spent studying the algorithm used, and a 2-line fix (and perhaps this lousy blog post). The plug-in still needs to be fixed as it’s got the code for the fractal algorithms repeated for the actual filter and for the dialog’s preview, which needs to be moved to a common function, and we’d like to implement the really smooth shading as well.

Do you find all this cooking interesting? Do you want to be a GIMP developer? Come and participate by picking an open bug and hacking it into submission. Join #gimp on irc.gimp.org to talk to developers, or the gimp-developer mailing list. Or you may want to try #gegl if you think you’ve got the sk1llz.

Looking for a job?

Lulu.com is hiring web software developers for their London, UK office. Lulu is a self-publishing company popular with many authors in academia, who write free textbooks. What Lulu does is make an on-demand real printed book out of a digital book (PDF and cover images). So if you have a PDF manual for example, you can publish it on their website (for free) and provide a link for people to purchase a printed copy. They also provide other services to authors, such as ISBN and getting books listed on bookstores like Amazon, etc. Lulu was founded by Bob Young, who was the CEO of Red Hat.

I’m writing about it here as they are a superbly friendly place (gathered from speaking to people in person), and have a casual atmosphere with a good work ethic. They use free software and agile methods and are bright enthusiastic people. When I visited them, I felt that it’d rock to work there. They mentioned that they were seeking web developers who were talented at free software based development on Linux.

So if you are good with free software and are looking for a job involving web development in London, write to Kimberly Richards (krichards at lulu dot com) with your résumé. Send answers to the Lulu quiz too! *Grin*

The Lulu quiz

This is why you should be a GIMP developer

We are having this discussion about data collection and privacy on the web just now in #gimp.. and about things which are done for our best interest:

<mitch> Other people who have bought these books: ....
        Other people who have fucked your wife: ....

Highly recommended!

This private quote was used with permission people, something you should keep in mind when you build your products and services.

Sven, Mitch and GIMP

bolsh: Even I want to support that Sven and Mitch are two great people to work with in a project. They are very helpful and excellent models (in the people sense ;) ). They guide you well if you’re trying to implement a feature or bugfix. I’ve also thought Sven was abrupt sometimes before, but I know that it’s because he’s being upfront and is harmless. There’re so many things to do, and few people with less time, so sometimes it can be frustrating. Language is also a barrier for non-native English speakers.

Btw, to all GNOME and other programmer people: GIMP is always looking for more developers. GIMP is well designed and very easy to write code for. So if you want to write some cool code, check out the open list of issues and enhancement proposals and see what you can help with. Also read HACKING and the plug-in development documentation. If you want to work on an issue or enhancement, or even something cool you think you want to do for GIMP, you can either join #gimp or use the gimp-developer mailing list and discuss it.

Demo of tracking you via the web-browser’s cache (no cookies)

Following the earlier post about tracking people using the web browser’s cache, here is an implementation of this issue.

To quickly recap, clearing your browser’s cookies is not sufficient to remove all identifying information in your browser from being sent to a website. IMHO, this is a pretty serious issue.

Clearing cookies is not enough to save your privacy

I’m sure someone else has thought of this one before, but anyway, time for next thoughtful post. This post contains more than one topic, but you should know this.

Update: Looks like others have thought of this idea before after all! Aww shucks, there go your cool points :( .

Cookies are a popular way of tracking what you do on the net, not just on some website you visit, but the entire net. As an example, consider you use a web based email service. You get plenty of personal private email there, from your girlfriends, your bank, job websites, shopping websites and pr0n websites (this should be enough to satisfy any entity’s thirst for getting their hands on private material but that’s not nearly all as we’ll soon see).

A digression: Btw, we fight so much to protect our privacy against the government and one of the arguments is that private stuff eventually falls into the hands of some evil public corporation which then uses that material to do evil things. Okay this is a valid argument. So why the hell do we trust our email to free web-based email providers like Yahoo mail, Gmail, Hotmail, etc., some of which didn’t even guarantee that your email will be deleted from their servers when you delete it??? It doesn’t make sense. Would you be okay if your telephone provider says, “Hey here’s a service. Let me record all your phone conversations and store them on our server so you can check them out later. We’ll store 5 years of your phone conversations as MPEG audio files. You can play them back anytime you want! :) ” Would you settle for that? I tell you it’s getting to that. One day your phone will be free and they’ll have targetted ads based on what you speak with your friends. These ads don’t necessarily need to come via your phone, when the same provider controls various services such as search, email, etc. apart from your phone. Wanna buy your love something for Valentine’s day? They know.

A lot of issues we face with protection of privacy (bloggers getting arrested, what not) would not happen if the relevant information was not collected in the first place. Then companies would not have to resort to “the government is asking me this information—we have to follow local laws”. Don’t collect this information in the first place! Don’t log my searches. Don’t link them to my IP address. Don’t link them to my account which I created for an email service. Don’t log my IP address against my account. Some providers say they do this for our convenience, so we have a better experience. Sure, convenience over security, but at least, give me the right to delete my tracked information from your collection immediately (not after the next full moon) and permanently. And do not share my information with third-parties as there’s no way of controlling its privacy then.

Okay, back on topic. So yeah, you use this free email service. Now this provider has other services which serve ads and track site statistics, not just on their website, but on thousands of other websites, and many of the websites you visit are on that list. This is not a figment of imagination—there are many such providers out there with different combinations of services. So everytime you visit some favourite website, your email provider can know you have visited it and can collect this information. Every time you read a particular article on a website with certain keywords (say physics, or xxx, or MP3 player review), your email provider knows and can link it to your account. What’s worse, they can collect information on the people who are emailing you messages, where they discuss some topic with you.

Clearing cookies may not be enough as you may think. Your browser’s cache is a valuable store of information. A JavaScript .js file resource which is generated dynamically when requested can have embedded a unique tracking ID and can live permanently in your browser’s cache when sent with the right HTTP cache-control headers. This JavaScript file can then be called by pages. The script is never re-requested, and hence keeps the unique ID, and it can call resources on the server-side to track you. They just need to associate this unique ID once with your account (when you login first time after the ID was created), and they can set cookies back again later and track you anyway. The result is that you can be tracked uniquely even past the point where you clear your cookies (i.e., as if you never cleared your cookies to generate fresh ones).

What can they do with all the collected information? Serve you relevant targetted ads. The side effect is that this nice little collection can be shared to other parties (thanks to a thoughtful TOS lawyer), and be shared to the government when forced, to opress their people. If you think this is over-reaction, it has happened and it will happen.

Companies exchange private information. They say that in their TOS which you usually ignore. For example, I was contacted on August 4, 2006 by a script at Google about my Sourceforge.net project, which asked me if someone else should be allowed to create a project on Google’s project hosting service with the same name as the Sourceforge.net project. Let’s ignore the fact that this email was sent by a script and was unsolicited. How did they know my details?? They should have a database of all Sourceforge.net projects and the owner email addresses and other details. I was quite unhappy about it. I created an account on Sourceforge a long time ago in good faith, but without thinking about this. Lennart Poettering has blogged of another private data sharing example, of this with the parties being Canonical/Ubuntu and Debian. Now these two examples are “community examples” of websites in our midst. It makes you wonder about what happens elsewhere.

Think twice about using 3rd party services which collect information about you, esp. when you can avoid using them. I always suggest that people get their own yourdomain.org and run your own email (with web-based access), website, Jabber server, etc. They’re interoperable with others and it takes 1 day to set everything up properly on your broadband connection (if you can’t afford to colo in a datacenter). You have complete control, and can do crazy hacks to filter your emails, generate custom dynamic content on your websites, notify you about stuff using Jabber/XMPP, and what not. And if you are worried about backups of your Maildir, website database, code repositories, etc., you can ask a few friends to store PGP encrypted incremental backups for you and return the favour. And use Adblock, which not only blocks ads, but can also block these nasty tracking scripts. All this takes 1 day of your weekend to setup.

Firefox should perhaps get a patch for a setting to clear the cache when the browser exits.

Update: Colin Leroy wrote to tell me that Firefox (versions 1.5 and above) already has such a feature which can be accessed at Edit→Preferences→Privacy→Settings button. It could be made more conspicuous like the “Keep Cookies” setting.