6

The log4j network adapter sends events as a serialised java object. I would like to be able to capture this object and deserialise it in a different language (python). Is this possible?

NOTE The network capturing is easy; its just a TCP socket and reading in a stream. The difficulty is the deserialising part

4

6 回答 6

5

Generally, no.

The stream format for Java serialization is defined in this document, but you need access to the original class definitions (and a Java runtime to load them into) to turn the stream data back into something approaching the original objects. For example, classes may define writeObject() and readObject() methods to customise their own serialized form.

(edit: lubos hasko suggests having a little java program to deserialize the objects in front of Python, but the problem is that for this to work, your "little java program" needs to load the same versions of all the same classes that it might deserialize. Which is tricky if you're receiving log messages from one app, and really tricky if you're multiplexing more than one log stream. Either way, it's not going to be a little program any more. edit2: I could be wrong here, I don't know what gets serialized. If it's just log4j classes you should be fine. On the other hand, it's possible to log arbitrary exceptions, and if they get put in the stream as well my point stands.)

It would be much easier to customise the log4j network adapter and replace the raw serialization with some more easily-deserialized form (for example you could use XStream to turn the object into an XML representation)

于 2008-08-27T04:35:57.700 回答
2

Theoretically, it's possible. The Java Serialization, like pretty much everything in Javaland, is standardized. So, you could implement a deserializer according to that standard in Python. However, the Java Serialization format is not designed for cross-language use, the serialization format is closely tied to the way objects are represented inside the JVM. While implementing a JVM in Python is surely a fun exercise, it's probably not what you're looking for (-:

There are other (data) serialization formats that are specifically designed to be language agnostic. They usually work by stripping the data formats down to the bare minimum (number, string, sequence, dictionary and that's it) and thus requiring a bit of work on both ends to represent a rich object as a graph of dumb data structures (and vice versa).

Two examples are JSON (JavaScript Object Notation) and YAML (YAML Ain't Markup Language).

ASN.1 (Abstract Syntax Notation One) is another data serialization format. Instead of dumbing the format down to a point where it can be easily understood, ASN.1 is self-describing, meaning all the information needed to decode a stream is encoded within the stream itself.

And, of course, XML (eXtensible Markup Language), will work too, provided that it is not just used to provide textual representation of a "memory dump" of a Java object, but an actual abstract, language-agnostic encoding.

So, to make a long story short: your best bet is to either try to coerce log4j into logging in one of the above-mentioned formats, replace log4j with something that does that or try to somehow intercept the objects before they are sent over the wire and convert them before leaving Javaland.

Libraries that implement JSON, YAML, ASN.1 and XML are available for both Java and Python (and pretty much every programming language known to man).

于 2008-08-27T05:05:47.270 回答
1

In theory it's possible. Now how difficult in practice it might be depends on whether Java serialization format is documented or not. I guess, it's not. edit: oops, I was wrong, thanks Charles.

Anyway, this is what I suggest you to do

  1. capture from log4j & deserialize Java object in your own little Java program.

  2. now when you have the object again, serialize it using your own custom formatter.

    Tip: Maybe you don't even have to write your own custom formatter. for example, JSON (scroll down for libs) has libraries for Python and Java, so you could in theory use Java library to serialize your objects and Python equivalent library to deserialize it

  3. send output stream to your python application and deserialize it

Charles wrote:

the problem is that for this to work, your "little java program" needs to load the same versions of all the same classes that it might deserialize. Which is tricky if you're receiving log messages from one app, and really tricky if you're multiplexing more than one log stream. Either way, it's not going to be a little program any more.

Can't you just simply reference Java log4j libraries in your own java process? I'm just giving general advice here that is applicable to any pair of languages (name of the question is pretty language agnostic so I just provided one of the generic solutions). Anyway, I'm not familiar with log4j and don't know whether you can "inject" your own serializer into it. If you can, then of course your suggestion is much better and cleaner.

于 2008-08-27T04:41:41.133 回答
1

I would recommend moving to a third-party format (by creating your own log4j adapters etc) that both languages understand and can easily marshal / unmarshal, e.g. XML.

于 2008-08-27T04:45:58.377 回答
1

Well I am not Python expert so I can't comment on how to solve your problem but if you have program in .NET you may use IKVM.NET to deserialize Java objects easily. I have experimented this by creating .NET Client for Log4J log messages written to Socket appender and it worked really well.

I am sorry, if this answer does not make sense here.

于 2008-08-27T05:22:24.033 回答
0

If you can have a JVM on the receiving side and the class definitions for the serialized data, and you only want to use Python and no other language, then you may use Jython:

  • you would deserialize what you received using the correct Java methods
  • and then you process what you get with you Python code
于 2009-02-20T17:34:13.313 回答