Category: Programming stuff

SECOQC: We’ve done it

Posted on October 12, 2008

Over the last one and a half years I’ve been involved with the SECOQC project. It’s goal was to provide a prototype of a quantum key distribution network. Such a system would provide unconditional security, thus wouldn’t be isn’t vulnerable to improvements in computing power as traditional cryptography.

The final presentation of the prototype happened this Wednesday. The last days and nights before that were filled with applying the last fixed but finally it was worth the time. But let the newspapers do the talking: orf, heise, der standard, sueddeutsche, Austrian Telekom News. There was quite good news coverage in german-speaking Europe (and some eastern europe countries) but sadly the news didn’t seem to have jumped over the pond (at least some American physicists were at the presentation so it got noticed anyway).

Feels strange to know that something that big and cutting-edge is finally successfully finished.. and that I’m an unemployed student agai

Ruby on Rails vs. Java part2

Posted on February 24, 2008

As mentioned in part 1 my task is implementing a mail and campaigning module. I started out with a pure Ruby on Rails application and moved to a mixed Java/Rails applications as Ruby’s communication libraries where just lacking mandatory features. I choose Java/Hibernate/Spring as the combination should provide me with most needed features through an easily extend- and configurable framework. What are my findings so far?

More…

Ruby on Rails vs. Java, part 1

Posted on February 17, 2008

One of the components of the system developed here at Blackwhale is an fairly advanced web mail/campaigning and analytics system. The first iteration of that component was fully implemented in Ruby on Rails. Writing the front end was fairly easy and fast, a perfect opportunity for Rails to display its strengths. On the back end and communication part on the other hand I stumbled into various problems:

  • Compared to PHP, Python or Java there are just too few communication libraries. And even the libraries that exist are lacking fundamental features and almost all of them come without useable documentation. That there is no encryption (SSL or TLS) support in the whole Ruby 1.8 mail libraries is an outright shame. The IMAP library is so clumsy that a web company sells their own (still not perfect library) through their online store and they seem to make a good buck with it.

    On the Java side is just a completly different picture: Lots of libraries, even documented ones are available. After having to touch the TMail (rail’s MIME mail handler library) API using javax.mail is a heavenly gift. And it seems that the TMail generated MIME messages were invalid in a couple of cases. Not the best feature of a support library. Also the chance of finding unsolved known bugs and errors seems to be lot smaller in Java.

  • This brings me to another point: I might change my opinion on Java’s constraints on its users. Java tries hard to prevent errors (i.e. the forced exception catching). I always thought that that took too many stylish possibilities away from the user, but by now I must confess that I think that this is exactly what I want from something that I’m using on the network side. I’m a lazy programmer, I want to be reminded and forced to write secure and stable code. This is the quite different to Ruby and Rails ‘make it easy for the programmer’ attitude.
  • Background processing is hard. As Ruby on Rails is not multi-thread safe you can’t just spawn a thread if you need to perform some longer running task. Another disadvantage of using a single-process model is that the long running request will occupy one rails worker (i.e. rails cluster process) until it has finished - in our case that costs us around 60MB of memory per long running request, even if it is just waiting for some simple SMTP feedback. If you can find a situation where you can delay the execution of a network related task (and if you don’t you’re not thinking) for two seconds, 6 requests per seconds will DoS a standard rails cluster.

    The only solution that’s actually usable is BackgroundRb. But projects that just change their background communication system just don’t sound to production-grade ready for me. Also the admin start/stop scripts for their background server didn’t work too well for me.

  • For a language that interferes it’s object’s attribute types directly from the database the ActiveRecord layer is weak. Don’t get me wrong, I understand that the Simplicity is needed to make it easily usable but I ran into various situations where I’d love to have a full blown ORM behind me. One feature that is needed quite often by our application is inheritance. ActiveRecord only offers single table inheritance, and even there you have to make sure that each row is valid (ActiveRecord should have all needed information to do that by itself BTW) or you will run into problems later on. One problem is, that it tries to abstract too much functionality away from the database while not provided as advanced interfaces by itself. Data integrity handling? Abstracted away by rails, so all databases can be used the same. The drawback is, that the data constraints and relations are fully handled by rails and not passed on to the database. Any process that might produce invalid data (e.g. a faulty rails component) might corrupt the data. Rails is able to handle that cases by itself (due to ducktyping and very few default checks), but access that data with any other framework and it blows up directly into your face.
  • Transaction handling. Just try it. Then cry. Also I’m not sure if Transaction handling is even done on database level or in Rails (as done with constraints). If the later is true, it’s acutally not worth anything as soon as more processes try to access the database.

The library and documentation problems where the main reason for me to reimplement the mailing and campaigning backend in Java. The front end is still a Rails application — which is exactly what Rails is for. As I’m no friend of blown-up EJB based solutions I’ve choosen a simple Spring and JPA based solution for that problem.

I’m currently testing the last features and replacing the Ruby code part by part. As soon as I’ve done that another blog post will examine the two implementations, how much time was spent on coding them and how they perform when compared to each other.

How to enable TLS support in Ruby’s/Rails’ Net::SMTP

Posted on December 19, 2007

Ruby on Rails uses Ruby 1.8 which still lacks support for STARTTLS or SSL. This was added in Ruby 1.9, but as this release is designated to be unstable (and only the road to a stable Ruby 2.0) Rails 2.0 doesn’t support running with it.

What to do? It’s a shame that a web framework like Rails doesn’t support secure transmission of mails.

So I rolled my own Ruby on Rails plugin that extends the base Net::SMTP class with STARTTLS features. You can just call Net::SMTP.enable_tls (or do the same with an instance variable) to enable secure communications.

Honestly I didn’t write all the code on my own but searched the net and adapted some around-floating patches to this plugin as this is the easiest way of providing the needed TLS support to rails.

So if you need it, just grab it from here. More Information can be found in the plugin’s README file. Have fun and share your improvements to the plugin.

How to upgrade to Ruby on Rails 2.0

Posted on December 10, 2007

I wanted to upgrade on project of mine to Ruby on Rails 2.0 to automatically get some security and performance upgrades. So how to do it? My first try was:

  1. checkout a new copy of my project from the SCM
  2. upgrade the used rails version to EdgeRails
    (through executing “rake rails:freeze:edge” twice)
  3. adopted the RAILS_VERSION variable in config/environment.rb
  4. Got a “500 Internal Error” without any usable logging information on starting the Rails server (”script/server“)

So what? I upgraded everything as planned and got an error without any real debugging help. What to do now?

After some googling around I found Mislav Marohnic’s excellent r2check.rb script. Just run it in your Rails directory and it will report a lot of errors and deprecated features that might hit you. One of those reported errors was the new notation for singular resources, after it converted my resources.rb the application finally started (albeit I had to fix some errors in the restful_authentication plugin, but nothing to heavy). So now I’m running Rails 2.0.1 and some pages really fell a lot faster.

Thank you very much for that script!

Ruby on Rails redux

Posted on November 12, 2007

Approximately one week ago I claimed that I was happy replacing Ruby on Rails with a conglomerate of Java frameworks. I’ve got wiser. Even with maven2 it’s too hard to get even a simple mavenized spring2, acegi, spring-jpa-hibernate configureation to work.

So I’m back to Rails. By now I feel more comfortable in the framework, but there are still some open issues but most things just work (TM).

You’re an atheist?

Only on Christmas or Easter,
the rest of the time it doesn’t really matter

House, M.D.

Using Apache ActiveMQ

Posted on September 18, 2007

I use Apache ActiveMQ for internal component communicatino in a project of mine. As only local (no distributed or network) communication is needed I configured it to use the VM transport, the fasted and most efficient way according to the online documentation. To prevent overhead objects instead of serialized XML is send through the queue - profiling has shown that using JDOM’s XmlOutputter to generate XML is a major cpu hog. Another advantage of using the VM transport is that no dedicated broker is needed, the internal one is started automatically. So everything is peachy?

Unfortunately not.

The first profile run (using Profiler4J) showed unusual high cpu load while the system was running idle. Further analysis showed that ActiveMQ’s message receive function does do polling to get messages from the queue. This only affects the framework when running idle and shouldn’t use as much cpu when there’s actual work to do but still this is highly disturbing. After investigating the ActiveMQ documentation again (on a side node: there seems to be something that automatically destroys anything resembling documentation as soon as a project joins Apache) I found out that there’s another way of receiving messages. As it was an asynchronous listener based approach it should be easier on the processor. As a nice side effect the receiver side of my application’s action got simpler too.

Better results? No.

More than half of the systems CPU usage in the “now-not-idle-mode-anymore” still is generated by polled message queues. My solution to that problem (due to lack of documentation) was ripping out the ActiveMQ part altogether and replacing it with a simple java.util.concurrent.BlockingQueue. It is thread-safe, simple to use and does a sort of sender side blocking (through a predefined queue size). In addition the data structure is a generic one, so some type-casts are avoided. CPU usage went down, a lot!

So what am I missing in the ActiveMQ picture? As it claims enterprise-grade this should not happen so I’m expecting an error on my side.

object-relational mappers in Java

Posted on August 26, 2007

After some time of absence to the Java, XML buzz world I had to reenter it lately for my master thesis. My current assignment is writing a data attraction and persistence layer. After finding out that there’s a multitude of possible frameworks and systems which I might use for my problem I dived right in, in contrary to BPEL engines most of them also worked.

The first step towards an advanced persistence solution was the usage of a ORM mapper. As the name implies those map between Java objects and relational data structures (i.e. databases). The programmer doesn’t see the underlying database world and just interacts with plain old Java objects. I started by using iBatis which occupies the one extreme of ORMs: the developer has to supply a SQL statement for each mapping. This also implies that the database schema is already defined before the ORMs is used. As I started from scratch I didn’t have a relational model, in fact I wanted to use an object-relational mapper to prevent myself from micro-designing the database layout.

The next contestant was Hibernate. This is one of the if not the most powerful object mapper for Java. In contrast to iBatis the latest version depends solely on Java annotations and a simple config file (which defines the overall database connection and declares which classes are to be mapped). I augmented my Java files and was ready to roll. After a bit of tweaking lazy loading also worked which isn’t to shabby for the invested time. Another advantage of Hibernate is HQL - the Hibernate Query Language: it provides the developer with an easy way of extracting Objects from databases through SQL-like queries.

But finally I settled with the Java Persistence API which is roughly the same as Hibernate but standardized through a JSR. There were minor differences with the annotations, but overall it was even simpler to use (the configuration file that stated which classes were to be persisted fell away). Under the hood I used Hibernate, but just through JPA, so performance and stability where the same.

The only drawback so far is the memory usage. JPA takes just all of it (and the JVM’s behaviour of starting memory reclaim as late as possible doesn’t help either). To ease this up I had to blurry the line between persistence and business layer a bit, this was felt as a step backwards as the goal was to minimize the persistence layer as much as possible.

As one of my thesis’ mentors prefers XML databases I will be able to contrast this to a XML based solution soon.

Our solution to the JBoss Problem

Posted on January 12, 2007

Finally, we’ve finished our solution for the Internet Application assignment at the TU Wien.

We didn’t find much good documentation at the net, so we decided to put up our final submission archive to this page. It is written for Java6, using JBoss 4.0.4 in conjunction with JUnit/AXIS 1.4 client test cases. There’s also a .NET c# webservice client and a rudimentary BPEL file. The submission document includes some error descriptions and our prelimitary solutions to that problems.

It includes:

  • JBoss JSR-181 enabled Entity und Session beans
  • build file for building and deploying the beans
  • JAVA unit tests using AXIS 1.4/JUnit
  • .NET c# webservice client
  • BPEL workflow using our web service (doesn’t work)
  • summary document

Hopefully this helps the next students that have to do that crappy work. So get it here.
.

network stuff I didn't learn at school..

Posted on January 4, 2007

While I learned the basic read/write (or send/recv) way of dealing with socket data in school I never heard of the more sophisticated and faster readv/writev stuff.

Read more for a simple example how you can improve your network code.

So, I was implementing some network server the last days. Through university and school I knew the socket send/recv way of dealing with things, which mostly goes this way (without error handling):

int fd = socket(AF_INET, SOCK_STREAM, 0);
uint32_t header = htonl(XXX_SOMESTUFF);
uint32_t other_header = htonl(1234);
uint32_t length = htonl(LENGTH_OF_DATA_PACKET);
void* data; /* data with length length */

connect(..);

send(socket, &header, sizeof(header), 0);
send(socket, &other_header, sizeof(other_header), 0);
send(socket, &length, sizeof(length), 0);
send(socket, data, ntohl(length), 0);

While this works, it uses quite a lot of syscalls (at least one for each send) and overall performance is rather bad. So the next step includes creating a new memory area and sending the whole as one:

int fd = socket(AF_INET, SOCK_STREAM, 0);
uint32_t header = htonl(XXX_SOMESTUFF);
uint32_t other_header = htonl(1234);
uint32_t length = htonl(LENGTH_OF_DATA_PACKET);
void* data; /* data with length length */

connect(..);

void *senddata= malloc(sizeof(uint32_t)*3 + length);
memcpy(senddata, &header, sizeof(header));
memcpy(senddata+sizeof(uint32_t), &other_header, sizeof(other_header));
memcpy(senddata+sizeof(uint32_t)*2, &length, sizeof(length));
memcpy(senddata+sizeof(uint32_t)*3, data, ntohl(length));

send(socket, senddata, sizeof(uint32_t)*4+length, 0);

This minimizes syscalls, but we need new memory to be allocated (and an insane amount of memory copies). Also nothing which looks to good in a high performance server.

Enters scatter/gather IO. The readv/writev functions do not need their parameters in a contagious memory area:

int fd = socket(AF_INET, SOCK_STREAM, 0);
uint32_t header = htonl(XXX_SOMESTUFF);
uint32_t other_header = htonl(1234);
uint32_t length = htonl(LENGTH_OF_DATA_PACKET);

void* data; /* data with length length */

connect(..);
struct iovec vectors[4];
iovec[0].iov_base = &header;
iovec[0].iov_len = sizeof(uint32_t);
iovec[1].iov_base = &other_header;
iovec[1].iov_len = sizeof(uint32_t);
iovec[2].iov_base = &length;
iovec[2].iov_mem = sizeof(uint32_t);
iovec[3].iov_base = data;
iovec[3].iov_mem = ntohl(length);

writev(socket, vectors, 4);

No memory allocation or copies are needed, the multitude of send syscalls are also avoided. As many network server already deal with memory buffers this might also be the more natural way of dealing with things.

Next to writev there’s also a readv function which can be used to receive from a file descriptor directly into a predefined vector array of memory locations. Can you thing: packet retrieval? With this functionality you do not have to receive the whole packet and parse its content but can receive the packet directly into the final fields.