Finding files always in the same order

sethmachine · Jul 1, 2014

Hi,

Is there anyway to force a programming script to always find files in the same order?

I notice that if I walk down my directory, the script I am using will collect the files by how they are sorted in the file system, e.g.

dir1
x.txt
y.txt
z.txt

It will collect x, then y, then z.

But let's suppose I added a new file to dir1 called "a"

dir1
x.txt
y.txt
z.txt
a.txt

Now all of the sudden, the script first takes a, then x, then y, then z.

So how could I force it to do x, y, z, a instead?

One (brute force) solution is simply to prefix every file with a number to force the order

dir1
0_x.txt
1_y.txt
2_z.txt
3_a.txt

But this has the downside, that everytime I need to make a new file in dir1, I have to prefix it with an integer. It also ruins the readability in a way.

A second solution would be to access how long ago the file was created, and thus sort the collected files by that before doing I/O.

Then again, if I have to read my files in a certain order, is that just a bad sign of organization?

For my problem, I basically have a directory that I occasionally add a new file to. When my script reads that dir, it assigns each file name a unique integer value. This value corresponds to a position in an auxiliary data structure. But a problem is, if I add a new file, it ruins the order of the data structure. This is because the data structure doesn't have an ability to overwrite previous data. So for the previous example here is my data structure, D.

D[0] = Attributes(x.txt)
D[1] = Attributes(y.txt)
D[2] = Attributes(z.txt)

But now, I added a.txt in, so here is what the structure looks like now

D[0] = Attributes(a.txt)
D[1] = Attributes(x.txt)
D[2] = Attributes(y.txt)
D[3] = Attributes(z.txt)

So the problem is, my data structure cannot overwrite previous values. This means everything is jumbled up now, e.g. D[0] will have attributes from both a.txt and x.txt. Likewise for every entry except D[3], which is correctly just the attributes of z.txt.

Before solutions are proposed, note that it is impossible to change the auxiliary data structure D in any way that would allow sweeping away previous entries.

GhostWolf · Jul 1, 2014

In what situation would you need to access arbitrary files (since if they are not arbitrary, you'd have the order defined in code) in the creation order?

In any case, your question answers itself, you need to sort them by their creation (or edit if that suits your needs) time.

edo494 · Jul 1, 2014

this should've went into http://www.hiveworkshop.com/forums/programming-714/

Simple google search: http://stackoverflow.com/a/21159271, if it is doable in one language at given architecture(according to that answer, it is not possible to retreive creation data from most UNIX systems, including Linux for instance), it is doable in any language with some trickery.

sethmachine · Jul 1, 2014

Aye I did sort them by creation time grudgingly.

Well the files are constantly being changed / added to, and what code is useful if it doesn't work for the generalized case of any files, rather than specific ones, e.g. suppose someone stores the code in an arbitrary directory structure.

GhostWolf · Jul 2, 2014

I am wondering what reason one would have to take an arbitrary collection of files, and do something with them in a creation-ascending order. I just can't think of any use case.

Dr Super Good · Jul 2, 2014

You will need to trawl the directory and locate all files. You then go about sorting them in a data structure in creation order. This is fine for a few files or even a few hundred files, but with thousands the number of I/O calls with the file system to locate and query dates may cause very poor performance. One way around this would be incremental updates (have a process constantly monitor the directory for changes, most OS support this event driven approach in some way) and then update the sorted list accordingly. The list can be cached between process starts preventing the need for a discovery stage. If you are dealing with a lot of files in a complex folder structure then it may be sufficient to generate incremental updates based on a brute force search only in folders that have changed. In either case you will want to avoid brute force ordering a huge number of files.

The reason you are getting them in some defined ordering may be based on partition logic. It is highly likely that partitions use some form of sorted structures (at least in the kernel side view) to make navigation more efficient. Especially since most I/O accesses are done using a file path or URL having them sorted alphabetically will likely result in a great improvement in navigation performance.

I am wondering what reason one would have to take an arbitrary collection of files, and do something with them in a creation-ascending order. I just can't think of any use case.

Folder synchronization programs, backup programs and change detection programs all use this. The logic being that anything created after the last "synchronization" time will not be synchronized in the program. Anything created before this time will be synchronized in the program and can be skipped. If you could access resources in this creation order it would make the process of such a synchronization faster. Obviously some times it is more useful to use last modified date instead of created date although there will exist a time where both are the same for any file/folder.

Finding files always in the same order

sethmachine

sethmachine

GhostWolf

GhostWolf

Resources

edo494

edo494

Resources

sethmachine

sethmachine

GhostWolf

GhostWolf

Resources

Dr Super Good

Dr Super Good

Resources

Similar threads