June 2013 News Batch

Zeatherann · Jul 17, 2013

DSG: what's your thoughts on std::fstream? (or ifstream/ofstream)

Deaod · Jul 18, 2013

Dr Super Good said:
Its deprecated because it only supports 2GB files [...]

Dr Super Good said:
Since it is so heavily used, they could not remove it or change it so instead they added nio, short for New Input Output.

They never wanted to remove it, which is why nothing i used has an @Deprecated annotation.

Dr Super Good said:
New Java programs should avoid the old io in favour of nio. Adapters exist to convert between the two stream formats however these are only for compatibility. In reality Java does file IO only using nio Classes. The ones you see in io package are wrappers around nio Classes.

Turns out this is bullshit as well. All of this. java.io is an ALTERNATIVE to java.nio. The two packages have different designs (stream oriented vs. buffer oriented), and while java.nio is certainly a lot faster (heard claims of >250% speedup), it doesnt matter if the bottleneck is not I/O.

java.io also doesnt use java.nio internally. Which you could have verified by looking at the source.

Dr Super Good · Jul 18, 2013

Turns out thats bullshit.

Um maybe you did not understand me. io has no seek method for very large files, nio does. io requires you to skip forward (or backwards?) by an int amount, nio uses a long absolute position. Although in this case where you are reading a stream file it makes no difference since everything has to be read sequentially, it does make a difference for an MPQ reader as there you are reading chunks at random.

They never wanted to remove it, which is why nothing i used has an @Deprecated annotation.

They never can remove it, it is too heavily in use to ever be removed. Only Java 2.0 could consider removing it but that is currently never planned.

All of this. java.io is an ALTERNATIVE to java.nio.

Java io came first and has been with Java pretty much since the start, nio was made as an official package then added into the standard edition due to a combination of popular demand and requirement.

it doesnt matter if the bottleneck is not I/O.

Which in this case it likely will be since you are parsing a file.

Deaod · Jul 18, 2013

Dr Super Good said:
Um maybe you did not understand me. io has no seek method for very large files, nio does. io requires you to skip forward (or backwards?) by an int amount, nio uses a long absolute position. Although in this case where you are reading a stream file it makes no difference since everything has to be read sequentially, it does make a difference for an MPQ reader as there you are reading chunks at random.

FileInputStream.skip()

Dr Super Good said:
They never can remove it, it is too heavily in use to ever be removed. Only Java 2.0 could consider removing it but that is currently never planned.

They can still officially deprecate it. Which they have not done. So you pick the tool thats best for the job, and thats not necessarily java.nio.

Dr Super Good said:
Which in this case it likely will be since you are parsing a file.

You do realize this is a simple proof of concept parser? I hope you also realize that the files that are going to be parsed are on the order of hundreds of KiB in size? I hope you also realize that this parser is useless without something around it, and im taking a guess here and say that anything useful is going to have bottlenecks other than I/O (i might very well be wrong, but unless you plan on parsing hundreds of MiB, you wont notice when a file is loaded once)?

Zeatherann · Jul 18, 2013

Whoa, java flamewar, and it isn't even about any other language.

Seriously? Stop yelling over the design plans of the people who add things to java.

Dr Super Good · Jul 18, 2013

Well here is my solution.

DOOIO is used to produce 2 different data container objects, one for Units and Items and the other for Doodads and Terrain Objects.

ioformats.doo.io contains all the structure classes. These are used for platform independent data interpretation of structures.

ioformats.doo contains all the data types and the deserialization class DOOIO. The data classes themselves could be organized into packages with separate deserialization classes for each type.

The reason I do not de-serialize each type within its own class is to avoid excessive importing of io classes. This improves cohesion as all I/O activity is performed in a single place separate from the data manipulation. It would also allow for different I/O types for the data (eg for transferring across a network or storing as a more efficient file format). You can view this as a sort of I/O plugin model where data classes have no real concrete I/O implementation but instead a number of I/O classes that manipulate them located in the same package.

SeekableByteChannel is used for the source as that is simplest type of channel you can use that declares its size. Since I am reading the entire file into a buffer (as the files are small) it is important to know how large the buffer must be.

The problem with .doo files is their complex stream orientated format nature. This prevents one from using a super efficient buffering scheme without extra intermediate data copies. For comparison a format like MPQ uses for archived files is buffer orientated and all chunks of a file are guaranteed to be equal to (uncompressed) or less than (compressed) the defined buffer size. Since .doo files are intended to be relatively small, it should make no difference loading the entire file into memory and then processing it.

By manipulating a FileChannel directly as a source, memory mapping could be used to eliminate internal buffer allocation and reading altogether. This would be considerably faster as no extra copying of file data is required after the OS pages it in. However such a method seems pointless to make since .doo files are located within .mpq archives and are usually compressed so the data requires at least 1 buffer swap in any case.

Thus the overall flow of reading a .doo file from a standard WorldEdit made .mpq would go...
compressed chunks -> raw chunks -> single file in buffer -> native objects

Since .mpq reading is already buffered, a lighter more streaming version could be used that only uses a buffer of the largest structure size that it fills with the required bytes at each stage. Whether this would be faster or not is debatable as it trades off a possibly cache inefficient large buffer with single call filling for a smaller cache friendly buffer with huge numbers of filling calls. It would also be slower when reading files directly as it would require buffering and extra memory copies.

Zeatherann · Jul 18, 2013

Large project. Nice.

Magtheridon96 · Jul 19, 2013

Wonderful, I love how you all actually participated for the last question ^.^

We should have more codegulf-like things around here

Almia · Jul 19, 2013

Magtheridon96 said:
Wonderful, I love how you all actually participated for the last question ^.^

We should have more codegulf-like things around here

Kill Me

Magtheridon96 · Jul 19, 2013

We have introduced you to programming, you are already dead

Almia · Jul 19, 2013

Magtheridon96 said:
We have introduced you to programming, you are already dead

WTF, so Im double-dead?

Zeatherann · Jul 19, 2013

I wouldn't mind coding more, it'd be fun to have a quiz question that must be answered with something other than gui/jass/vjass coding wise.

Dr Super Good · Jul 19, 2013

Would be better if someone used all this in a project. You could make people write the code with an intention of it being used then.

Obviously you would need to define the code specification better, preferably an interface and language restriction as those are easiest to use in other components.

Deaod · Jul 19, 2013

Zeatherann said:
I wouldn't mind coding more, it'd be fun to have a quiz question that must be answered with something other than gui/jass/vjass coding wise.

Check out Project Euler.

Zeatherann · Jul 24, 2013

I want to make a third party editor, in C++, with plugin functionality in lua i think, idk on plugins.

So i'm slowly writing a parser for every map file.

Nestharus · Jul 24, 2013

Zeatherann said:
I want to make a third party editor, in C++, with plugin functionality in lua i think, idk on plugins.

So i'm slowly writing a parser for every map file.

so vJASS and Galaxy?

What are you using for parsing?

I'm working on something that's integrated into notepad++, but the tool chain isn't specifically linked to that, so we can combine ;D.

Already doing a makefile framework for vJASS using Lua and c++. Also have some c++ for embedding Lua into vJASS.

Zeatherann · Jul 24, 2013

a separate 3rd party editor, like JNGP.

Nestharus · Jul 24, 2013

eh

but the tool chain isn't specifically linked to that, so we can combine ;D.

=P

and

What are you using for parsing?

Antlr4? : D

c# target is out, so c target shall be soon : D

Zeatherann · Jul 26, 2013

I use C++, and make heavy use of the standard template library (std::stream and friends, plus containers and memory management).

muzzel · Aug 22, 2013

Give a pair of strings that return the same hash when passed to it.

muzzel
aoaiwmwn

Magtheridon96 · Aug 22, 2013

Impossiiiibru~~

muzzel · Aug 22, 2013

there are more..

muzzel
aoaiwmwn
apldzgzk
cjaluhq
onnubfy

And meg, ur new avatar sux. I could barely recognize its a dancing moose... very phunny...

Magtheridon96 · Aug 22, 2013

That's what happens when you have to scale it down ;_;

The original is much bigger

muzzel · Aug 22, 2013

so where is my rep?

[3 Rep] Give a pair of strings that return the same hash when passed to it.

I really need it, im planning to have more rep than DSG till the end of the year. Plus, my solution is much cooler than Nestharus'.

Nestharus · Aug 22, 2013

muzzel said:
so where is my rep?

I really need it, im planning to have more rep than DSG till the end of the year.

question was already answered ;\

Zeatherann · Aug 23, 2013

If DSG got rep for answering the parsing question with a different language than providing a better answer than simple case changing should also be awarded. My question is how did he learn that those strings hash to the same thing.

Dr Super Good · Aug 23, 2013

Trial and error. I found out about it when I was trying to create an efficient save/load system that could resolve each character in O(1) time instead of the usual O(n) time which most save/load systems used. To do this I used a hash of a string and started to notice strange and wrong results. After some research I found that both 'A' and 'a' give the same results when passed to the hash function. This meant I needed to use a O(n) linked list to resolve between cases.

Since the hash produced is not case sensitive, it means any strings with alternative case of the same content would hash to the same result.

I am pretty sure I was not the first or last person to notice that. People like Nes also wrote save/load systems and I am guessing he some how came across it as well.

Magtheridon96 · Aug 23, 2013

I gave him 4 for answering it in Style. Also, Orcy is not marking them as solved ;_;

muzzel · Aug 23, 2013

Much better avatar, meg.
+rep (€ oh i have to spread first)

Magtheridon96 · Aug 23, 2013

Your avatar is lovely muzzel, it's a penguin C:

muzzel · Aug 23, 2013

Really, its a penguin? Your power of observation is remarkable!

June 2013 News Batch

Similar threads