Sunday, December 29, 2019

Replacing Your Oracle JDK with an Open Source Alternative

Oracle says that Java is still free. That is a true statement given that the platform is still developed and distributed free of charge at Oracle's web site. What is not free anymore is using Oracle's distribution outside of the update cycle. You can still use the Oracle JDK freely if you always stick to the latest supported version. This essentially mean you have to update the JDK you use every 6 months, even if the release is an LTS (Long Term Support) release. If you do not do  this then you have to pay Oracle for support. It is also not an options to just keep running on the unsupported version without any upgrades, this is still a violation of Oracle's terms and conditions. Updating to a major release every 6 months is not a tempo that most projects can sustain. Just getting the tooling to follow suit will drain the energy out of most projects. Luckily there are alternatives to using the Oracle distribution and this is why Java truly still is free. I will go through some of the alternatives here and give you my take on what to select.

I have only selected to talk about ones that are actually free in production, other like Red Hat JDK are free but only for development, for production you need a support agreement with Red Hat. This does not mean that it would not be a good choice for you, especially if you already have a support agreement with Red Hat, but I am focusing on free as in beer. The alternatives I will talk about are the following:

The first thing that should be noted when talking about these alternatives is that they all build on the same source code. They are all based on the OpenJDK project so there are no feature differences between the distributions. What varies is what parts are built into the distribution, platforms, release cycle and installers.



Amazon Corretto Azul Zulu AdoptOpenJDK OpenJDK
Versions Available LTS versions, currently Java 8 and 11. LTS versions and latest version of java, currently Java 8, 11 and 13. All versions of Java from 8 and on. All versions of Java from 8 and on.
Platforms Supported Windows (x86, x64), Linux (x64, aarch64), macOS (x64) Windows (x86 64-bit), Linux (x86 64-bit, ARM 64/32-bit), macOS (x86 64-bit) Windows (x86, x64), Linux (x64, arm32, aarch64, ppc64le, s390x), macOS (x64), AIX (ppc64) Build it yourself and it will (likely) work.
Docker Support Yes, Linux only. Smallest image is around 250MB. Yes, Linux only. Smallest image is around 55MB. Yes, there are images for all platforms including windows. Smallest image is around 80MB. You can build your own docker image.
Installers MSI, PKG, RPM, Deb and ZIP/Tar MSI, DMG, RPM, Deb and ZIP/Tar MSI, PKG and ZIP/Tar Zip/Tar
Support Available via AWS Support Plan. Can be purchased with Azul. Can be purchased with IBM. No Support.
Release Cycle Quaterly with possible out of cycle releases for serious issues (e.g. security). Quaterly. Quaterly. When ever you build a new release.
Components No OpenJFX included. OpenJFX downloaded and installed separately. No OpenJFX included. OpenJFX needs to be built separately.

I've mainly included the OpenJDK column to show what these distributions are doing for you. I would not expect that you actually build from the source unless you have some very specific needs (and remember, if you do that, all changes you make need to be contributed back to the project). As you can see there is generally not that much difference between the distributions, unless you need a specific platform that happens to be only available in one distribution. For production you should only run one of the LTS versions (currently 8 and 11, next one is 17) so unless you plan on going into production in a couple of years then it doesn't really matter if the latest release of Java is available. I would recommend that you select the Azul Zulu distribution as it is the most versatile in terms of available versions, release cycle and has excellent Docker support. Also if need be you can purchase support. The exception being that if you go on AWS then you might just as well take Corretto.

Monday, June 13, 2011

JavaFX 2.0: Using the WebView Component in a Swing Client

Oracle has released a beta version of the new JavaFX 2.0 which is taking a new direction compared with previous versions. JavaFX script is gone and the intent moving forward is to have a Java library roughtly on the same level as Swing. Although this is a clear conceptual break from the Swing interface with the introduction of the scene graph, it is much more familiar to anyone coding Java. In addition, the ability to mix JavaFX and Swing inside the same application offers a very compelling migration path where you can port pieces of your existing application as opportunities arise instead of an all or nothing approach. So it is definitely worth looking into JavaFX for your UI needs. I am still doubtful as to how much it will spread as a RIA platform, it seems to me that HTML5 makes a much more logical choice now. However, I would say that this definitely offers some interesting options for desktop applications, i.e. as a replacement for Swing in the long run.

I have just played around a bit with the beta SDK with a focus on the Swing interaction. I've made a small application based on the samples and tutorials that come with the JDK and this was overall very easy to work with. I focused on the WebView component which is a very interesting web component for java. It boasts support for javascript and HTML5, so it has the essentials covered. In addition and contrary to most other web components out there, it is a lightweight component. This means in essence that it actually mixes nicely with the other components in Swing and JavaFX. The heavyweight components out there will not respect glasspanes and layers in the swing application which can be problematic if you depend on this.

To test out how well the integration works, I decided to use the famed JLayeredPane from Swing to integrate with. This is one of the components that usually proves to be the most challenging to use and integrate with. I've put together some simple code that shows a WebView in a layered pane with a JLabel. The JLabel is on top of the WebView and is transparent, it seems to blend in very nicely. In the example I also played around with injecting some javascript into the web page, in this case to change the background color. I also tried to see if the current API supports call backs from the javascript to the java code, but I didn't manage to find a clean way of doing this. A hack I came up with was to use the prompt method from javascript which can be intercepted in the java code, this does not feel clean at all. If anyone has a good of doing this, please leave a note in the comments.

Below is the code that I wrote. You can run it after you've installed the JavaFX 2.0 Beta on your system. I ran it from the IntelliJ IDEA IDE and had the JavaFX libraries in my classpath.

Edit (30/07/2011): Since the release of JavaFX 2.0 Beta build 36 the sample was broken as the Application.launch method is now a blocking method. I modified it so that the frame initialization now occurs in the start method of the Application class. Also there seems to be a problem running with jdk7 in that the WebView does not get rendered, so run this with the latest version of java 6.


Sunday, August 1, 2010

Java Serialization: Using Serializable and Externalizable and Performance Considerations

I've had to look into some possible performance optimizations for a product lately and as part of that I wanted to see if there was anything to gain on the serialization/de-serialization front. Therefore, I did a little bit of research on what can be done in terms of customizing object serialization and I thought I would share the small results of my pocking around.

Java Serialization is the basic mechanism for serializing your objects into a bit stream that you can use to store or transmit objects. The usages are many but the usual suspects are storage to disk, RMI and object cloning. Making objects serializable in Java is as simple as making your class implement the Serializable interface. That is at least the theory, since all the fields of said class must also be serializable, i.e. all fields must point to a class that implements Serializable. Should that not be the case you will quickly discover it at runtime in the form of a NotSerializableException being thrown. This is all very simple and quite powerful, a lot of functionality is open to you by just tagging your class with the Serializable interface. Of course simplicity usually comes with a price, and in this case you have to pay a performance tax.

There are generally two standard ways of customizing serialization (well there's a third variation which I am showing later):

  • Implementing the writeObject/readObject methods.
  • Implementing the Externalizable interface.

Externalizable gives you full control over the serialization process of an object whereas implementing writeObject/readObject just plugs you into the standard serialization flow. There are differences but most are fairly subtle and not so obvious because they both ask you to implement methods that are almost identical. In the writeObject/readObject case:

and in the Externalizable case:

The concept for both methods is the same. You are given an OutputStream to write the state of the object to and in the other end you get an InputStream to read the state from. Imagine a Class with 2 fields and the the implementation may look like this:

The fact that the writeObject/readObject methods are marked private is not a mistake, they have to be, otherwise it does not work. Actually, any mistake in the signature will generate no error during compilation but will produce no result at runtime either. Although most IDEs will help you now, it is quite easy to make a mistake, whereas Externalizable guarantees a compilation error if you made a mistake in the method signature. Besides from that the methods look very similar and would actually in many cases have the same implementation. The implementation shown above could as well have been the implementation for writeObject/readObject.

I performed a bunch of different implementations of serialization for a simple bean with 12 fields on it. It has 3 Long fields, 3 Double fields, 3 String fields and 3 Date fields. This is fairly representative of the objects transferred in the project of interest to me right now. The raw results are shown below. I have chosen two measurements, the time it takes to serialize/de-serialize and the size of the object when serialized. The test is run on Java 1.6_21 64 bit (server mode) on a standard PC with Intel i7 920 2.67GHz with 6Gb of memory. The code is available here. You will of course not get the same times from one run to the other but the proportions should remain the same. Times are averaged over 5000 objects serialized and repeated a number of times. The sizes are also an estimate because the test beans contain random Strings which depending on the content do not serialize equally. All in all though this varies little from one run to the other.


Bean Used

Serialization (ms)

De-Serialization (ms)

Total (ms)

First Object Size (byte)

Subsequent Object Size (byte)

Standard Serialization

40

30

70

597

201

Dumb Externalizable

29

19

48

377

198

Standard Serialization with Primitive Fields

17

12

29

410

160

Dumb Externalizable with Primitive Fields

14

12

26

245

168

Efficient Serialization

7

7

14

427

148

Efficient Externalizable

9

5

14

198

145

Efficient Externalizable with no null Handling

8

5

13

194

132

Okay so what does this mean, to better understand here is a short description of each bean used.

  • Standard Serialization: Nothing special is done but tag the bean with the Serializable interface.
  • Dumb Externalizable: The bean implements Externalizable but all it does is call writeObject for each field on the object.
  • Standard Serialization with Primitive Fields: Nothing special is done but tag the bean with the Serializable interface, the only difference here is that primitive fields are used instead of the object wrapper (e.g. long instead of Long)
  • Dumb Externalizable with Primitive Fields: Same as the Dumb Externalizable but primitive fields are used instead of the object wrapper. Which also means that we do not use writeObject but for example writeLong for these fields.
  • Efficient Serialization: Implements writeObject/readObject and does not use writeObject but transforms the object into its primitive type first. In the case of the Date objects, the time is taken as a long millisecond and in the other end the date object is recreated using the time in milliseconds.
  • Efficient Externalizable: Same as the Efficient Serialization case except the Externalizable interface is used.
  • Efficient Externalizable with no null Handling: Same as Efficient Externalizable but all fields are assumed to be non null.

A few things seem obvious by looking at the result:

  • Even a dumb implementation of Externalizable does better than the standard serialization.
  • An optimized implementation can save a significant amount of time and size.
  • Using primitives in your data gives a boost to serialization.
  • Serializable always produces a bigger size for the first object.

Now the first one is to be taken with a grain of salt. It is faster because the standard serialization relies on reflection and even a dumb implementation does better than that. It seems however that the more objects are serialized the less the difference between dumb and standard serialization. I suspect this is due to hotspot doing its job and basically optimizes the standard code to the level where it basically is the dumb implementation. Still if the serialization is not used enough that this optimization will kick in then doing even the most basic of implementations will save some time.

More interestingly is the optimized implementations. You have to be able to do it, if your bean only has primitive field, you will not get far. However in the case of more complicated objects such as Date the fact to just send the millisecond representation can save a lot of time. You do loose information such as the locale but if that does not mater to you because all time is set to UTC anyway then there is much to gain.

Using primitives gives an immediate advantage but you do loose some information as well. You cannot tell that a field has not been set. In java, a Boolean field has 3 possible states: true, false and null. I'm not saying it is good but this actually maps quite well to what a database supports, so the wrapper is likely more useful than the primitive.

If sending the stream over a network, size can be as important as the time. Externalizable has an obvious advantage if only one object is sent. This is because standard serialization sends the object definition with the first object. This is not the case when using Externalizable
(although some information about the object is automatically sent, such as the type).

So all of this is great, why aren't you already coding your beans with custom serialization? Well as always things are not free and there's a cost to this. The main cost is going to be maintenance and by that I am not simply referring to the time spent keeping up to date the serialization, but also the time spent chasing mysterious bugs because someone forgot to update it. I would say that unless a lot of serialization is going on in your application it is probably not worth it. A way to alleviate the maintenance issue, generating the code needed for the serialization code should be considered.

Okay so you are going with it and you are going to use custom serialization so which method should you use? The impact on time spent is not much different between Externalizable and writeObject/readObject. There is a size advantage for Externalizable but only for the first object. There is a very significant difference between writeObject/readObject and Externalizable. Externalizable promises total control over the serialization of the object and it is actually true. This becomes apparent if you are extending another class. Consider the following base and class extending it.

Now we will implement the serialization for the SerializableExtendingBean, first the writeObject/readObject version:

Then the Externalizable version:

They look very similar, except that one of them works and the other does not. The writeObject/readObject will work but the Externalizable will produce the following output if used to clone an object:

Unsuccessfully cloned. Fields are missing, original: {company='Doe Inc.', position='CEO', name='Doe', surname='John', birthDate=Thu Jan 01 01:00:00 CET 1970}

Clone: {company='Doe Inc.', position='CEO', name='null', surname='null', birthDate=null}

All the fields in the base class have been reset to their default value. But why were they not reset in the other writeObject/readObject version. Well that is linked to the fact that the writeObject/readObject methods are private. The methods are not only called on the SerializableExtendingBean but also on the base class (in essence at least), meaning that we still benefit from the default serialization for the base class. To actually make the Externalizable version work we would have to do something like this:

Now you really have yourself a maintenance nightmare with the Externalizable interface, if you add a field to the base class you have to update any Externalizable class that extend it. Of course you can make this a lot easier by having the base class implement Externalizable as well and have extending classes call super. This implies that you have control over the base class, which is not always the case.

For this reason if you have to deal with a hierarchy of objects, I would recommend using writeObject/readObject just because the odds of making a mistake are minimized as you do not have to worry about the parent classes. Externalizable is more flexible but if you are only dealing with simple beans you are unlikely to really need it.

Earlier I mentioned there was a third way to do serialization. It is more an expansion of the two other methods. It is the usage of the readResolve/writeReplace methods (private again). The basic idea is that you will delegate serialization/de-serialization to another object. An example below:

So the object actually sent into the serialization stream when a SecurityClearanceCustomSerializationBean is encountered is an object of the internal class SerializedObject. What is the advantage of doing that? Well it is the only way I know of to customize serialization for an object with final fields and no default no arguments constructor. Also in the case of the static definition for default values of the class such as TRAINEE we can actually reduce the serialized stream to a single byte (well for the part that we control, it is not going to fill a single byte there is always some definition overhead for an object even an Externalizable one). Also if you clone a TRAINEE object then the clone will still have reference equality with the static field.

To finish, an interesting detail, that is not, as far as I can see, documented anywhere. What happens if you have a circular reference between two objects? Do you need to do anything? Let's look at the following example:

Imagine two beans which point to each other via the bean field. When you serialize one of the objects it will call writeExternal, which will serialize the other bean via the writeObject call, this triggers writeExternal on the other bean, which should again try to serialize the first object. This however does not go into an infinite loop. The reason for this is that the writeObject does not re-serialize an object if it has already been called with the same object reference before. This also means that if the object has changed internal values in between then too bad, the stream is not updated. It seems reasonable though, because solving the cyclic reference would be quite painful to deal with each time you want to do custom serialization.

That was a little bit more than I had planned for, but I hope you will find this useful if you ever want to do some custom serialization.

Sunday, July 6, 2008

Why you should take a look at Scala and where to start?

Scala is advertised as the successor to Java by some and as an experimental ground for new features by others. According to its creator, Martin Odersky, it is supposed to be the swiss army knife of programming languages, something that is a scripting language and yet still offers all that Java has and more. When saying that it is a scripting language, it is referencing the usual characteristics of these languages which is dynamic typing and low verbosity (Scala is NOT dynamically typed, it does type inference which in practice means that you rarely have to declare the type of your variables.). It is very difficult at this point to determine if Scala is going to be widely used, so why should you use time on Scala? Why not Groovy or Ruby? The two latter languages are definitely worth looking at, however they truly are scripting languages and most of their popularity is not so much due to the language itself, but more due to their application in Rails frameworks. If you are already familiar with scripting languages, then these languages have very few new features but mostly polished versions of existing concepts (Ruby is as old as Java, so very few things are actually new in Ruby). Scala on the other side has a number of features that are bit more original or at least cutting edge, so even if Scala doesn’t make it as a mainstream language a lot of the features in the languages are likely to appear in whichever language makes it. So why not get a heads start? Now, I am in no way a language expert, I am practitioner not a theorist, but this is the general feeling I get from the people I talk to and the what I read around the net.

So where should you start? To understand the why of Scala and the principles it was built on, I can recommend the presentation that Martin Odersky gave at JavaOne. Audio and the slides can be found at the Sun Developer Network here (subscription required). I was at the session and I can testify that looking around the room there were quite a few of the big shots of the Java world attending (such as Joshua Bloch and Brian Goetz). A light introduction aimed at Java developers called Scala for the Java Refugee, I found very useful to get a quick start with a mapping from Java concepts to Scala concepts. After that you can move to the more Scala specific features that are appropriately presented in the Scala tutorials from the official site. I found the tutorials very concise and a good introduction to all the major concepts of Scala. As for books, there is only one I know off at this point and it is the Programming in Scala book. It is due out at the end of July, but you can already get a PDF version and preorder the book. I've heard positive things about it and I have preordered the paper version.

So after reading all this, what next? Well as for everything else in computer science, you have to start coding. Starting up is fairly easy, you can follow the documentation on the official site, and you should be compiling and running in no time. There are Ant and Maven targets available so integrating it into an existing project is fairly easy. As for which IDE to use, I've tried Netbeans, Eclipse and IntelliJ IDEA. They all have Scala plugins, at more or less advanced stages. Although I am a big IDEA fan, the Eclipse plugin is for now doing the best job, although it is by no mean very stable. I am, however, going to keep a close eye on IDEA 8 and the Scala plugin that is scheduled to come out shortly after. Scala is completely compatible with existing Java libraries, so you won't have to reinvent the wheel for every part of your program. This goes for all libraries and not just the trivial ones, I've successfully used Scala with Hibernate for example. You should have no problem running inside an application server either. Therefore, there is no reason to start from scratch with Scala, you can just take a component of your application and implement it with Scala, and keep the rest of your existing code.

Personally, I have implemented an Applet using Scala and I found the language very pleasant once you get used to the syntax. You can do some things very efficiently, and I find that the language is generally very readable. Java is definitely very verbose compared to Scala, and I found that having no checked exceptions and closures really make the code focus on the "what" of the program and much less on the "how". The use of Actors for Swing, is also very interesting and made for some really readable event driven code. I can warmly recommend that you give this a shot, if only to get a perspective on different programming constructs.

Sunday, January 27, 2008

The Java Mobile & Embedded Developer Days

This week I had the privilege to go to the Java Mobile & Embedded Developer Days in Santa Clara near San Francisco. It was a most pleasing experience and taken into account that this was the first instalment (and hopefully not the last), it all went surprisingly well. Now, mind you, this is no JavaOne, I think there was between 150 and 200 people at the conference over the two days it was scheduled for, so it is at the smaller end of the scale. However, size doesn't always matter and when you come to conferences smaller is sometimes better because you can go more in depth with specialized topics. This was not necessarily true for this one though and in my opinion it may not have gone so much in depth as it could have. A disclaimer before moving on, I am not in any way an experienced embedded systems specialist and have very limited experience with Java in that field. I've done some C and assembly coding in that area, but no Java development. Yet that did not matter much at this conference even though it taunted to be aimed at intermediate to experienced developers, which left me wonder a little bit. Okay, so let's dive into what happened.

As is customary to a lot of Sun sponsored conferences, James Gosling gave the keynote of the conference. Nothing revolutionary in his talk (There was even the usual: Stop using emacs, damnit!), the mobile platform is the desktop of tomorrow and we'll end up having several billions of devices, so let's put the ease of development that Java offers in there. As a guy having coded C on embedded systems (or any other system), I certainly will not disagree with him on this point. The performance of Java has at this point surpassed C and C++ in a lot of performance benchmarks, so this should be a non issue at this point. However, still remains the large footprint of a JVM in terms of memory and power consumption, but more on that later. Announced was the open sourcing of Project Squawk which is a Java based implementation of the JVM (chicken and the egg problem, anyone?), but besides from that it was your average opening keynote. I wasn't able to find the slides online anywhere, maybe they died with James's laptop (At least I would expect this ones not going to have a long lifetime with the amount of complaining from James), but you can find all the other slides here. After that it was on to the technical sessions.

Since this was a fairly conference, there never was more than two concurrent sessions at a time, so you would never miss too much of what was going on. The first session was about JavaME security domains. It was an attempt to address the frustrations that developers have with all this damn security that prevents them from doing what they want. Listening at the presentation it reminded me a lot of the trouble with Applets, although there it could almost always be solved by just signing your application jar. With JavaME you can also sign your application, but that guarantees almost nothing since it is left up to the actual phone manufactured and network providers to define the security policies. That means that if your signed application works with one provider (although most likely with a lot of warnings and prompts thrown at the user), you have no guarantee it will work with another provider or even work with future phones of the same provider. This was a recurring theme of the conference, the providers (and the manufacturers to a certain degree) have made it pretty much impossible to easily port your application from one provider to the other by putting a lot of customized settings to their platforms. Frustrating...

Next up was a presentation of NFC (Near Field Communication) which is basically a short range communication protocol (like 4 cm). The idea is that you just bring your device up to a another NFC device and they can interact to say: Transfer business cards or process payment at the store. The idea is that, since it is such a short range, it is more secure (yeah, I already see myself curled in the corner of the metro to avoid virtual mugging through close contact). It introduces JSR 257, which introduces what I would call a very standard API with listeners, factories and extensibility through interfaces. Looks easy enough to use and potentially has a ton of possible applications, so now all we need is for some actual devices to go out there and use it.

On came Sun and their Sun SPOT team, and that is when the little geek in me got its biggest "Wow! This is cool!" moment of the conference. A Sun SPOT is a small device that is equipped with an ARM processor, temperature and light sensor, accelerometer, wireless, LEDs and a bunch of I/O ports. So what can you use that for? Well everything you want, it is basically a prototyping platform for all kinds of embedded systems. It is easy to use and you will not have all the usual hardware problems to deal with when first trying out your idea. It is fairly cheap (550$ for 3 SPOTs and a base station) and comes with a fairly impressive Netbeans integration. It is pretty much a geeks dream when it comes to toys :-) Yet, I can definitely see the commercial potential of providing hardware for companies to do quick prototyping of new products. At the conference, there was a demonstration of how these had been used to create a bunch of cheap robots to study swarm behavior. Very cool!

The rest of the first day's technical sessions was less interesting, it basically boiled down to a Netbeans tutorial (after all it is a Sun conference). Granted Netbeans is getting better and better, and version 6 definitely makes it cross line between annoying and usable, but come on, this is something that you can go read a tutorial to figure out and while one session about it is okay, let it go, we'll go check it out, stop trying to push down our throats.

To conclude the first day, and before going to Maria Elena's Mexican restaurant for a sociable evening, a panel discussion with representatives from phone manufacturers, network providers and a few software people. The moderators had a script running about what a developer needs to do to get to the mobile market with some new software. This was all very depressing, because what the panel was saying is that you basically need to bow to the network providers. They really don't want you on their phone without them making money out of it. They want to be the physical distributor and the content controller, a little bit like the music industry. Basically you buy a phone but you can't just do what you want with it. They were interrupted mid way by a rebellious voice from the audience, wanting to direct the conversation towards "Why bother making software for mobile phones?". In my opinion, the core of the problem even though the manufacturers have an incentive to get creative software on their phones, the network provider have killed creativity by closing the platform completely to external sources (at least in the US. Asia and Europe seem to have moved beyond that and at this point have a lot more mobile applications than the US). So sad, but I cannot help to wonder what the iPhone will do to this business model after the release of the SDK next month. One can hope that it will force operators to open up to third party applications.

After a good nights sleep, it was time for the second day of the conference. Sadly, it was only 2/3 of a day for me, because I had to leave early to beat the snow storm and get back to Los Angeles. So most of what I saw was dedicated to JavaFX Mobile and PhoneME. JavaFX mobile is not just a mobile implementation of JavaFX the scripting language, it is a complete operating system for the phone, much like... well, PhoneME. So why is that we have two platforms competing within Sun? I do not know and that wasn't really properly addressed. We already have a segmented platform on the mobile market, so why add even more? The talks were somewhat interesting, but it was hard to get exited by yet another phone operating system.

Lastly there was a presentation of Project Squawk. This was by far the most interesting presentation for me, the goal of the project is to rewrite as much as possible of the JVM to java code to make it easier to port to other platforms, embedded platforms among others. Another advantage is that it is a lot easier to cut parts of the JVM and only have what you need to run your application on the given platform. Sort of a customized JVM. Really interesting and since it is now open source everybody can go check out some JVM code without being a C expert. Really cool.

So what is the conclusion for this conference? If I had to sum it up in as few words as possible it would be: Embedded: Cool! Mobile: Sad. The complete clustering and lock down of the mobile platform makes it really unattractive for developers, which is really sad, since there is a lot of cool ideas to explore. So for now, I would stick with embedded if I had to go smaller.

Saturday, January 19, 2008

What are closures and what do they mean to Java?

I will try and show an example of what closures can be used for and how this functionality looks like if implemented in the current version of Java. I will be using the Scala programming language to demonstrate a closure. Scala is a brand new language that is running on the JVM. I am by no measure a Scala expert, actually prior to writing this post I've only spent a few hours looking at the language. Therefore if you are a Scala guru, please be gentle if commenting about the code.

Before showing you the code, let's first define what a closure is. According to Martin Fowler: A closure is a block of code that can be passed as an argument to a function call. It's not a new idea, it has been around since Scheme in the 60's. It was usually associated with functional programming and not object oriented programming, but it now exists in many OO languages like Ruby and C# (2.0). So what is that you can do with them that you cannot do in Java now?

Let us take an example of what a closure could be used for. File manipulation is a common function in any language and in many languages it requires resource handling of some sort. Doing file operations can usually be abstracted to: Opening the file, so your stuff, close the file. If you forget to close the file, this might lead to your process running out of file pointers. So let's try and abstract resource management a bit and let's write a method on a file that will let you perform an operation on each line of a file without having to worry about opening and closing the file. The example below is as I said in Scala. One of the very nice features is that you can reuse all existing Java classes, so I'll extend the File class and add a forEachLine method to it. This method allows you to pass a block of code that will be executed on each line of the file. So here is the new class:

package ScalaApplication2

import java.io.File
import java.util.Scanner

class ScalaFile(filename: String) extends File(filename: String) {

def forEachLine(parseLine: (String) => Unit) {

val scanner = new Scanner(this)
try {
while ( scanner.hasNextLine() ) {
parseLine(scanner.nextLine())
}
} finally {
scanner.close()
}
}

def compareTo(other: Any): Int = {
0
}

}


So what is going on here? We've extended the File class and added a new method called forEachLine. It takes on parameter named parseLine and it is defined as a function type which takes one parameter of type String and returns nothing (Unit seems to be the synonym of void). As you can see, it is very easy to define a function as a parameter to a method in Scala and it is equally easy to use it inside the while loop. So now that we've implemented this new method let us look at how it can be used.
package ScalaApplication2

object ScalaMain {

def main(args: Array[String]) = {
val file = new ScalaFile("C:\\testFile.txt")
var charCount = 0
file.forEachLine((line: String) => {
println(line)
charCount += line.length()
})
println("Number of characters in File: " + charCount)
}

}


So here we use the forEachLine method to print each line and count the number of characters in the file. To do this we create an anonymous function and as can be seen it is fairly straight forward. Note that the variable charCount is defined outside the scope of the anonymous function and yet can seamlessly be used inside the code block of the anonymous function. Keep that in mind for later.

Our goal of hiding resource management is reached since opening and closing the stream in encapsulated inside the forEachLine method of the ScalaFile class. Now let us look at how we can do this in Java. To get closure like behavior in Java, we need to use interfaces and anonymous classes. Below is the implementation of the new File class that we will us.

package javaapplication1;

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
import java.util.logging.Level;
import java.util.logging.Logger;

public class JavaFile extends File {

public JavaFile(String filename) {
super(filename);
}

public void forEachLine(JavaFileCallback callback) {

Scanner scanner = null;
try {
scanner = new Scanner(this);
while(scanner.hasNextLine()) {
callback.parseLine(scanner.nextLine());
}
} catch (FileNotFoundException ex) {
Logger.getLogger(JavaFile.class.getName()).log(Level.SEVERE, null, ex);
} finally {
if(scanner != null) {
scanner.close();
}
}

}

public interface JavaFileCallback {

void parseLine(String line);

}

}



Since we cannot pass functions or methods (at least not in a type safe way) as a parameter to our method we have to use an interface with a callback method. Still our goal is accomplished, the stream is opened and closed inside our forEachLine method. Now let's look at what the caller has to do.

package javaapplication1;

import javaapplication1.JavaFile.JavaFileCallback;

public class JavaMain {

public static void main(String[] args) {

JavaFile javaFile = new JavaFile("C:\\testFile.txt");
final int[] charCount = {0};
javaFile.forEachLine(new JavaFileCallback() {
public void parseLine(String line) {
System.out.println(line);
charCount[0] += line.length();
}
});
System.out.println("Number of characters in File: " + charCount[0]);
}

}


In Java, instead of having an anonymous function, we have an anonymous class that we pass to our forEachLine method. Now we all know that variables from the parent class cannot be accessed inside the anonymous class unless they are declared final. So how do we pass on our charCount variable? Well, we have to use a final object reference which variables we can mutate. In this case, we simply use an array. This adds some overhead but it is the only way to do it.

I've just demonstrated that we can replicate the exact behavior of closures with interfaces and anonymous classes. I will even go as far as saying that anything a closure can do, you can implement using the current version of Java (I will stand by this until proven wrong :-). However, that is not so much the point. This is a very simple example and there are more complex uses of closures which require even more complex code in Java. Therefore, the point is that you can do some fairly complex things with little and simple code in a language that has closures built in. In Java, you can do it but let us be honest it does not look pretty. Yet we do it everyday. Think of how many times you've had to create a runnable or callable to simply pass it on to an execution service. It would be much nicer to simply pass blocks of code around instead of actual classes and interfaces.

This is not intended to add anything new to the debate about closures in Java, just to give you an idea of what they are through a simple example. If you're interested in knowing more about closures and the many problems that need to be resolved if we want them in Java, you should check out Neal Gafter's blog. He is one of the driving forces behind the current proposal for Java and has many more examples and in depth discussions about the pros and cons (let's face it mostly pros, but it's still good reading).

Sunday, September 16, 2007

JMS: So simple yet sometimes so tricky

The Java Messaging Service (JMS) API as provided as part of the Java EE appear to be a small, limited scope API. However, appearances can be deceiving; experience has shown that a lot of developers make small but far reaching mistakes. JMS is a useful tool that allows to easily implement asynchronous communication between components. It also allows for cross platform integration with implementations in C and C++ of the API. Therefore, to some extend JMS is the golden axe of the architect to many problems like cross system integration, scalability and reliability. Not as buzzword compliant as web services, but still very useful and used, so how does this translated in the hands of the developers using it?

First let us briefly summarize what JMS is about. It is a connection based messaging system with two main abstractions for messaging channels, namely topics and queues. Queues and topics have very different semantics. Queues are for one to one communication and Topics are for on to many communication. Queues will ensure that message will buffer up the message until a consumer, well, consumes it. Topics will not hang on to it and only if a consumer is listening at the time the message is generated (The exception being durable subscriptions). To be able to receive or send a message you need four things; a connection factory, a connection, a session and a destination. The connection factory and the destination are acquired through a lookup in the InitialContext (I’ll skip the entire setup of resources). A connection is acquired through a connection factory and the session is acquired through the connection. When you have acquired all these, you can initialize a consumer or a producer from the session. As can be seen a lot is involved before you can send a message or even start listening. However, once you know how to do it, it’s pretty much the same each time, so it should be tenuous, but trivial. Right?

Well as it turns out, not really, at least not if you take into account the number of bugs that get generated by developer’s working with JMS. Don’t get me wrong, it is not a dark abyss of software bugs but compared to the simplicity of the API, it seems surprising. So what are usually the big sinners? I’ve mostly encountered three; thread safety, resource management, and fault tolerance.

So let us start with thread safety, now if you have been reading the spec, you know all about it and it is actually explained in there how you are supposed to do. That is were most architects say “great! It all taken care of, thought through and ready to use by the developers.” The big mistake here is that most developers don’t read the spec. I have to admit myself to not reading the spec for everything that I use, but the fact is that I probably should. Actually, it took the first mistake using JMS for me to read the spec. In there, it is VERY clearly stated that Session and MessageProducer/Consumer are not thread safe, yet a lot of developer’s just fire them up and cache them without any regard to the number of threads going through there. On a client you might get away with it most of the time, although you will have some mystery bugs. On the server, you’ll probably notice quickly because your transaction will start misbehaving if you share sessions. So how to fix this? You could force all developers in your organization to read the spec, but forcing people to read a 120+ pages rarely does any good. So I would propose adding a simple requirement to the programming guidelines of the project (I’m not talking about the guideline that tells you where to put the braces, I’m talking about the useful one not written by QA). Simply require all Session and Producer/Consumers to be local variables. Local variables are by definition thread safe, and this requirement is fairly easy to check with static code analysis.

Okay, let’s move on to resource management, JMS is quite resource heavy. Connections, sessions, producers and consumers are all resources, by which I mean you actually have to close them once you are done. If you don’t, you have a resource leak on your hands, which can be difficult to track down. There isn’t really any good constructs in the Java language to prevent leaks from the framework perspective. You have the try-finally construct, but then you are relying on the clients to prevent the leak. Closures might make an appearance to help this later on, but it still has to be seen if it ever makes the cut. However, static analysis once again can help detect potential bugs, if your close call is not in a finally block (or not there at all), you probably have found a bug. On an application server, managing resources is easy, since it is up to the application server to manage them, you just have to give it a chance to do it. That means setting up a connection pool on the application server, not hold on to the connection and remember to close the connection. To hold on to the connection is a common mistake, after all you are supposed to fetch resources on ejbCreate() and release them on ejbRemove(). Well, that would be a mistake, because a JMS Connection is not expensive to get if it is in the pool is up and running, but by holding on to it, you might force the application server to create new connections which is expensive at creation and to maintain. Therefore, if on the server the same recommendation than for Sessions in the previous section applies, always make your connections local variables. On the client, things are a little bit more complicated. Nobody is managing resources but you, so you have to be a little bit more careful. You could just create your connections and release them right away when you are done. This would work well if you only send messages from time to time. Of course, you can’t do that for listeners. If your clients are heavily using JMS, you might want to implement your own pooling mechanism. This is easier than it sounds, basically all you need to do is put a wrapper around the ConnectionFactory, and since you probably already have a service locator implemented, you could have it return the wrapper instead of the vendor’s connection factory. The advantage of this is that no matter how well the vendor’s implementation scales you can tweak your end of the system.

Last but not least is fault tolerance. This is actually not a part of the JMS spec, they did not want to put up rules for what should happen when the services start failing for a reason or another. For example, nowhere does it specify what to do if a queue fills up (i.e. the consumer is not there anymore). So it goes from throwing an exception to blocking, to just throwing away messages. With these kinds of conditions it is difficult for the developer to know what to do since it will depend a great deal on the JMS provider implementation. This is probably the least compelling about the JMS spec, it defines very well the success scenarios but when it comes to the edgy failures they leave it to the implementation. Flexibility is good, but it makes it a pain to switch from one implementation to the other. A tempting solution is to wrap the vendor’s implementation with your own implementation that will enforce a certain behaviour. This is not very satisfying because that defeats the out of the box solution that JMS should be. The only recommendation is to read the vendor’s documentation very carefully before choosing (or switching to) one to make sure that it offers satisfying fault tolerance mechanisms for your project.

So all in all, while I still consider the JMS spec to be very well designed there are a few practical matters that make it difficult to get right in practice. Some are due to developer “laziness”, but others like a total lack of specification of any fault tolerance or reliability mechanism are embedded into it. However, nothing that cannot be solved with a little discipline on both the architects side and the developers side, so with this in mind, it should be a tool of choice in your asynchronous communication toolbox.