Category — Storage
Enriching metadata
Search engines like Google™ can fetch documents scattered across the web in no defined order within seconds with an accuracy that never ceases to surprise. On the other hand let us pick something like Beagle™, the desktop search engine that comes with Ubuntu Linux. Why can Google turn out millions of accurate results (unless you are searching for “elegant software design with perl”) while Beagle struggles. The answer is that Google’s pagerank looks at links within pages and whatnot in order to determine the rank of a webpage on the web. We then get to the point of improving file metadata through links within files. Sounds odd doesn’t it ? Well relationships between files isn’t exactly the best way to go about things. Reason? Volatile memory ( RAM ) has to be backed up by slower hard-disks. However non-volatile memory like MRAM might just be the ideal solution.
I found a research paper that introduces a filesystem called LiFS - Linking File System. LiFS enables linking within files.
LiFS introduces two system calls - rellink and relsymlink. The only difference between the two is that the former creates a hard-link and the latter creates the more popular ( among unix users ) symbolic links. rellink <source> <destination> should hard-link the destination to the inode of the source file. LiFS allows users to create multiple such links between the same two files so that users can define several relationships between the files but do ensure that each link is uniquely identifiable.
For searching the filesystem, LiFS provides “openlinkset” which allows the user to see all outgoing links from a file and “openmatchlinkset” which provides links from a source having matching attributes.
LiFS is a cool idea that I feel semantic filesystems of tomorrow ( I mean tomorrow/later, non volatile memory is not something at our/my disposal at least ) . Anyway, happy reading !
February 13, 2008 No Comments
The obsession with semantic filesystems continues……
I have this habit of getting glued to fs types this year. I found another paper on semantic filesystems here.
In my search for a semantic filesystem, I was quite pissed by the extra huge number of desktop search applications, all making logical queries to a file’s metadata. Sure, desktop search is cool, but pray how do third party apps use your search tools ?
I believe that one needs a filesystem based on tagging and not another Spotlight clone. During my long and exhaustive search for such a filesystem, I came across tagsistant. Tagsistant is a 3 day hack (!). Tagsistant uses FUSE to implement an fs with an SQLite3 database. There is even a GUI to define relations between tags and to define new tags altogether.
Apart from that snippet on all things semantic, I just signed up for an account in facebook. I have heard so much about the facebook api and how well it was tested against django. Let me see if I can come up with a cool facebook app sometime later.
I’ve got to sit (stand actually) for my practical examinations on the 4th and 5th. I did revise a few experiments in the FIITJEE lab and I hope it goes well.
BTW, I have a few euler problem solutions here that will be put up in the next post. I am close to solving problem#25 now and once I finish, an update containing solutions to problems which are worth mention will be posted here.
Till the next post, goodbye ![]()
February 2, 2008 No Comments
Update…..
I returned after writing my english exam today at FIITJEE. Hats off to them for having thrown the practical examination 3 days after the prefinals end. I have never seen an institution with such excellent planning. I am not going to waste space on this blog writing about this education factory. Let me instead dwell upon what I have done in the past week.
I finished reading the final year project of PrashMohan (His IRC nick). His FYP talks about a semantic filesystem - SemFS. SemFS is a filesystem implements file-searching based on the file’s metadata. This is a solution to the ever-increasing hard-disk capacity making it difficult to retrieve files.
SemFS allows a user to choose either the traditional “/home/shriphani/bingo.txt” hierarchy or a mechanism based on queries to offer virtual directories. SemFS provides properties of “Owner” and “Timestamp” by default. Files like JPEG images and MP3s of course carry info about the height and width (image properties) and length and artist respectively.
SemFS reminds me of Beagle, the search engine that comes with Ubuntu nowadays (and makes computers like mine groan under the strain - I use debian though
). I had a look at how Beagle worked and it seems to work in the same fashion by look at a file’s metadata.
It was a good read indeed and I have decided to create my own semantic filesystem ! I was trying out something called LUFS-Python when I was in class 11. LUFS allows you to create filesystems in Python. I will direct towards devices like iPods ( I love mine ). I have been pissed by the impracticality of picking ext3fs as the filesystem. I believe media players need filesystems with features like:
- Versioning - Do take snapshots at times and do not let the user fiddle with these. If snapshots promise to tick off all the media, just archive the metadata of each song. It will be of great help in combating piracy and will for once force people to look beyond recording labels for artists who release their music under open-source licenses.
- Lightweight caching application for updating the metadata. Tag-cache from rockbox sucks too much of my battery.
I will keep adding to this list and then see how it turns out.
BTW this blog has undergone another transformation and I picked this theme as I like its feel. I am going to put up a post for Lorelle Van Fossen’s blogging challenge. But first BOFH beckons !
January 25, 2008 1 Comment
Surprise mail and ideas.
Here is an excerpt from a mail in my gmail inbox:
Hide quoted text -
On Mon, Jan 07, 2008 at 09:31:39AM +0530, Shriphani Palakodety wrote:
> Hello,
> You are my IDOL !!. Every ext2fs utility that I see is made by you.
> There are countless times when debugfs and e2fsck played an important
> part in my “piddling” around with external devices. I applied to MIT
> for a place in the class of 2012 and I aim to be like you.
I’m glad those tools have been helpful for you.
Good luck getting into MIT!!
- Ted
That was a mail from Theodore T’so, the extremely cool maintainer of the e2fsprogs package and one of North America’s first Linux developers. He has like 2 degrees from MIT ( YAY!) and now works at IBM and gets paid to hack on Linux ( in short, gets paid to do what he likes ).
So let’s move on to something else. I happened to see this article on Semantic Filesystems and began thinking about it the entire day. I was trying to figure out the purpose they would serve. I then thought of the now-hacked TWINCLING WIKI (our mistake really). The web2.0 philosophy is more about reaching out using the Web and other nonsense, let us just say one opens up a “publicly editable” content management system - one where without logging into an account, an individual can put content up for the world to see (something like free advertising space - an idea? probably
). Let us say, I install something like drupal and implement FCKEeditor to allow people to throw content on this site. php’s file apis are more or less unix-like. If one gets the weird idea of throwing an “rm -rf” in there and getting it to execute somehow, BOOM!
It might seem easy to recover at first. But beware, once this CMS is removed, there is every chance that the backup is kicked out as well (if the backup is in the same directory as the CMS). Let us just say we had a file-system that knew about this installation and knew where the backups were. Within minutes after the attack, the intelligence the fs possesses should enable it to reinstall the CMS and put all the posts back. This is a better way of doing things than let’s say reinstalling the CMS manually hours after the it has been compromised. Sounds like a far cry, but is possible.
There can be worse situations at times. Let us just say that an administrator has found that a certain movie is taking up too much space on the filesystem(probably the movie is 5 gigs in size). He goes on to delete this file and realizes that no space has been freed. This is because the file will continue to take up space on the drive till the process which opened them is killed. Now that the file has no name, it is much harder to deal with. A filesystem with inherent intelligence should be able to perform the hardkill (signal 9) on every process accessing this file (this process could well be a search application keeping track of the files). Such enormous potential is what intelligent filesystems hold. 5 months till college. I can hardly wait.
By the way, that tweet over there —> will soon change ![]()
January 7, 2008 1 Comment
Post Christmas….
I am a Hindu, so what? I take every opportunity to celebrate. So I went with my dad to one of these eateries that exemplifies the “Today pleasure, tomorrow diarrhoea” category. Well I don’t actually have diarrhoea now but what happened in the morning was … well figure it out.
Later that day my dad came into my room to have a look at what I was doing. I was working on “timepass”, a web application that plans to do many things I am not too sure of (http://launchpad.net/timepass). He showed me an external drive and told me that the he was not allowed to access directories one level below the root directory of the device. He plugged it into his laptop (a gleaming Dell Inspiron) and showed me. I was stumped, FAT32 drives and no permissions ? I remember that something similar had occurred once upon a time with my iPod as well. I decided to work on this. I had two possible ideas about what could go wrong:
1. Drivelocks (we can’t do much apart from call the vendor of the drive and pray that he gives the password to us)
2. Filesystem needs repair. This could be done easily using the command “dosfsck -a <device>”.
I asked my dad about the drive. It was a Toshiba make and was the HD in his previous Compaq laptop. After this I knew what was wrong, drivelocks. I don’t know if drivelocks can be repaired with dosfsck and I decided to give it a try. I plugged it into my laptop (runs Debian remember) and it got mounted. I could even navigate as per my wish!
This leads us to conclude one thing: “Linux doesn’t give a damn about drivelocks”!
I offered my dad to do a direct dump using the dd command. I had a 200 gig maxtor and after a period of more than 15 minutes, the copying was done. I get to keep the new drive and my dad gets the previous drive. He had quite important stuff in there. Most of them pertaining to his work (designs and blah blah).
I don’t know what impact this blog has in the IT world but if Toshiba reads this, they are going to <censored> bricks.
After that I got to look at software my dad uses to design stuff. There was Staad Pro, Tekla Structures XSteel and something called DTH. He was talking about productivity as I saw him swiftly create figures that made no sense to me (he is building a port so I am not expected to have any idea about it).
I am submitting all my essays tomorrow. Man, it really gives me the jitters. There is something about this admissions process that makes me feel like “someone cares”. Is it true? Does someone care if you are a geek at 16. Does someone care if your mouth waters if you look at Apple’s 10 TB storage rack ? Does someone care if you broke both your bones and almost lost your life, prepared for the class 10 boards without going to school for a major part of the year and carried the injury to class 11?
The answer is: yes. Someone cares. Someone really important cares. I am going to click on the “Submit” button tomorrow and I feel priviliged to do so. I really cannot believe that a college from which Claude Shannon graduated is going to read my application, a college where Google finds its roots is going to read my application, or a college where Raj Reddy sits, is going to read my application. It is just too elite to think of. All I can do is thank those who have made this possible, dad, mom etc.
I am getting too emotional now and my blog posts suck if I get too emotional. I will write again later.
December 26, 2007 2 Comments
What I am up to these days
I have not been able to post a lot these days as I am all tensed about applying to colleges and so on, it is indeed a very stressful process. I have shortlisted a few but I think it is better I don’t shoot only for the top few and also look at tier 2 colleges with decent research prospects.
Now I am an individual who is eager to play with things. After the LUG meet I decided to make my own file manager extension for firefox that would enable me to easily pick the pictures that I would like to upload. I have made a basic UI for it and I will soon come up with a working version of it in about a week.
Apart from that I got to play with BSD again courtesy a generous gift from Deependra Shekawat ( a great friend who is a freenode regular ). He sent me the PCBSD installation cd. Let me recollect the process.
I put the cd in and up came a very good-looking screen that enabled me to cruise through the process and I kept encountering worthless pics that claim its capability to play all my music and edit all my files and whatnot. Typical “BSD on the desktop is finally here” kind of presentation. After a few minutes the installation finished and I was staring at KDE (oh how those jumping icons irritate me). Surprisingly everything works including my wireless. Even the distracting LED (ACER’s innovative design. They place the radio kill switch under the touchpad so that I can always hit it and see the Network Manager applet tell me that no wireless networks exist). I decide to experiment. The .pbi method of installing things irritates me further.
I recollect that a very dear friend who uses freebsd had told me about the complaints bsd threw when a device was not unmpunted correctly. I plug my external drive in and yank it out immediately. I plug it in again and type `mount /dev/da0s1 /mnt/external_disk` at the shell and get some stupid error report. I investigate further.
Tune2fs seems to be a good utility to look at ext3 fs parameters (my external drive is ext3 formatted, I know it is not smart but who cares). I notice this :
Filesystem features: has_journal resize_inode dir_index filetype needs_recovery sparse_super large_file
Bingo! So all I need to do is remove the ‘needs_recovery’ “feature” (stupid I know).
Being the MIT appreciator I pick debugfs which according to its manpage is written by someone from MIT (woohoo!!). So here goes:
[root@psp-laptop /root]# debugfs
debugfs: open -w -f /dev/da0s1
debugfs: features
Filesystem features: has_journal resize_inode dir_index filetype needs_recovery sparse_super large_file
debugfs: features -needs_recovery
Filesystem features: has_journal resize_inode dir_index filetype sparse_super large_file
debugfs: quit
So there we are. I would however recommend that one runs e2fsck on the drive. That will also remove the troublesome “feature”.
Work, work and more work. I need to draft those essays now.
I will write again later
November 6, 2007 No Comments
Pdumpfs - a cool versioning utility, Elephant Filesystem
Here is a simple versioning utility - for those who like to keep it short and simple. pdumpfs is similar to Plan9’s dumpfs. pdumpfs is written in Ruby and stores snapshots in YYYY/MM/DD format. To the current snapshot, only those files are copied which actually have been modified. Those which haven’t been modified end up as hardlinks to the previous snapshot. Cool ain’t it ? Have a look at pdumpfs at http://0xcc.net/pdumpfs/index.html.en. Enjoy !
I came across a pretty interesting filesystem called the Elephant FileSystem. Why ? Because Elephants never forget ! EFS was made keeping the public in mind. Thats right ! EFS is what I call cool. Let me explain. Manually controlled versioning filesystems need to be controlled ( excuse the redundancy ). This can result in too many backups and not to mention the confusion that follows. Apart from this, lets just say you are working on a file and make a blooper ( read as edit a whole paragraph and save only to realize that the previous paragraph was better ) and did not run the versioning utility before undertaking this enterprise, you have to resort to your memory. EFS versions a file as soon as it is written to. However, most operations are reversible. Deleting a file does not mean that it has been booted out of the storage device. Data retention policies can be specified for every file or a group of files. EFS scores all its brownie points courtesy its understanding how users store files on their drive. While it becomes impossible to remember what distinguishes a version of a file from another, EFS does it using the concept of landmark files. This is because storage is always limited. One can never have an infinitely large storage device. When you get close to occupying all the space on your drive, it becomes necessary to kick out a few of those versions. EFS’s creators believe that a landmark file should be allowed to stay and other lesser important files be removed. Heuristics of files should of course be determined by us. A landmark could be newest version of the file or one which had the largest time gap between successive versions and so on. It is also important to distinguish between source files and object files as object files are rarely of use after linking The paper talks of two versioning policies - keep one (the policy that comes with all filesystems)/ keep all (the policy that comes by default with most versioning filesystems ) and keep landmarks (the coolest of the policies ). The filesystems we use on a day to day basis allow us to link one inode per file. Thus one can use the inode number to name files. However EFS departs from this model as files have multiple inodes. It redefines inode numbers to index a file’s log instead of its inode.
EFS treats directories differently compared to files. The directories plainly store versioning information. The creation and deletion (if deleted) times of a file are stored. Directory entries are retained as long as at one version of the file they name is retained.
EFS is cool but I do feel that a nomenclature style like ext3-cow would really be excellent. Have a look at EFS, its definitely worth it
.
September 22, 2007 No Comments
We are elite…. yes we are.
Today, an unsuspecting user arrived at #computers on IRC with a complaint about a dead disk. Linux to the rescue was what we said immediately ! We made the individual download Ubuntu (sshhh…). Two hours later the download was fixed and we were working on the problem.
Point 1: The drive was behaving oddly. We demanded that ubuntu be booted into and /dev/sda be mounted. How?
sudo mkdir /media/ext_dev ; sudo mount -t ntfs /dev/sda /media/ext_dev
Didn’t work.
We then asked him to use the command:
dmesg | grep /dev/sda
He says its spat out a lot of errors. I was thinking, fs error for sure. The command I told him to use:
sudo ntfsfix /dev/sda
Four vital hours later, (must be one large drive
, done).
It worked !
Phew, now I can get back to reading the posts on comp.sys.ibm.pc.hardware.storage ![]()
September 12, 2007 No Comments
ChironFS to keep your fs running, NILFS…
Replication of data so that it can be dumped across filesystems and when one filesystem goes pfft, you can use the replica. How does it sound ? I know just like RAID 1 ( Redundant Array of Independent Disks ). I found ChironFS yesterday and it sounds cool. ChironFS is purely FUSE based and can replicate data across a variety of filesystems. Check it out at http://code.google.com/p/chironfs/
Anyway, let us move to my latest obsession, filesystems with Versioning. NILFS is one of these filesystems for Linux. It can do continuous snapshots ! And all you need to do is just specify epoch as an argument and pat comes the file that was existent once upon a time in so-and-so state. What’s more, you can mount any number of these snapshots at a given instant (read-only).
I didn’t need to recompile my kernel at all as NILFS comes as it comes with a loadable kernel module.
I didn’t really go to NILFS expecting a lot of surprise. I just wanted to mess with the idea of versioning that NILFS came with. Remembes, ext3-cow allows you to access the snapshot of a file as if it were in the current filesystem. NILFS however does it in a different way. I personally appreciate ext3-cow a lot more than NILFS but thats just me.
Gotta rush so bye.
September 9, 2007 No Comments
Hard-linking directories? It’s possible
Let us face it, no matter how geeky we are or how computer-aware, hardlinks to directories need to be banned. But, I don’t really care, I want to be able to use hardlinks on directories. The BSDs did bring about symbolic links but a simple rsync from your free shell directory to your PWD ( presently working directory ) will tell you why.
I was told one of the “cracker” tricks for Linux which takes for granted that the user is geeky and dumb (possible - but I haven’t seen a species as yet). Mr. Saifi told us about hard-linking /etc/passwd into our $HOME. Try running a chown on you dir and there goes security down the bin……
Why do these problems occur. Because the very presence of symbolic links has made hardlinks obscure. Sysadmins no longer care about hardlinks - they just keep doing trash work forgetting about hardlinks. But one filesystem plans to change it all ! GCFS - Garbage Collection FileSystem. It claims that it does allow hardlinking of directories but I was confused about what it would do to the tree like structure of the filesystem once one allowed hardlinks. What about the possibility of infinite looping courtesy carelessness ?
From the GCFS homepage, I got to know that GCFS does away with the tree like structure, allows linking a directory to a subdirectory and whatnot.
Some users may be perplexed by Meta-characters like “.” and “..” which we cannot do without. The thing is that in GCFS, the path you traced while reaching your $PWD is remembered and “cd ..” should take you to the previously inhabited directory and not necessarily the first hardlinked directory.
Right, till next time, goodbye !
September 3, 2007 No Comments


