List of terms for the final exam. Free Frames seem to be of the same kind as the stuff we had in the last lab. TLB: We have a bunch of associative registers and hence there is parallel search. If the page number is in the associative registers, get the frame number from the TLB. Else, need to pay the page table a visit. Effective Access Time (EAT) : Assuming _alpha is the probability of a TLB hit, EAT = (time_for_tlb_lookup + time_for_mem_lookup) * _alpha + (time_for_tlb_lookup + time_for_mem_lookup + time_for_mem_lookup) * (1 - _alpha) Upon Context Switch: If there is a process tag in the TLB, you don't need to perfrom a complete flush. The process tag is used to keep the "most used" pages in the TLB all the time. The justification is that most processes tend to use very few pages and after a context switch if we manage to keep these pages inside the TLB, then things will work to our advantage. If, you evict a page from memory, also keep track of TLB entry for that page if there is one. On the other hand, if you don't have a process tag, you need to flush the TLB entries at the time of a context switch. TLB Cache Similarities: -> Both Cache A Portion Of Memory -> A miss in both cases leads to a memory lookup TLB Cache Differences: -> caches have actual data values -> TLB stores memory addresses. TLBs are usually associative and Caches can be direct mapped. Memory Protection: Valid / Invalid Bit: Basically, with every entry in the page table, keep a V/I bit. If bit is valid, then the page accessed belongs to the actual address space in question. I might imply illegal address or simply a page fault i.e need to pull frame from disk Non Executable Bit (NX bit): Basically separate text from data by marking text pages as X and data pages as NX. We replace text pages because on eviction, no write back needs to be done. Waiting for reply about shared pages. Hierarchical Page Tables: ------------------------- This is a solution to the problem of extremely large page tables. this solution is that you break up the VIRTUAL ADDRESS into several sections. Every one of these sections except the last section are page numbers. the last section is still the offset. Example: 2 level page table. You divide the logical address into 3 sections. The first two sections are page numbers indexing into 2 page tables. The third section is the page offset. The first section indexes into a "Top Level Page Table" and is used to lookup a "Second Level Page Table". The second section indexes into the "Second Level Page Table". The advantage is that the second level page table "doesn't" need to be in main memory all the time. Only the Top Level Page needs to be in memory all the time. Page Fault Handling: The Valid / Invalid bit is looked at in the page table. If the bit is invalid, the OS looks at another table to decide if: -> the address is a legal address (not outside virt addr space) -> page needs to be swapped in. So, now pull a frame from a free frame list and swap page into frame. reset the tables i.e set v/i bit to v Go back to the same process. A page fault is the only interrupt that leads to the same process being executed post a context switch. Effective Access Time: Average time taken to access a page. to calc this, you are given a page fault rate = p. EAT = (1 - p) * mem_access_for_the_page + p * (swap_page_out + swap_page_in + restart_overhead) Typically a huuuuuuge slowdown, large factors like 40 etc for 1 pagefault out of 1000. Copy On Write: This is a decent technique to reduce proc creation overhead. Typically, if a parent spawns a child process, the memory is shared initially. i.e. same virtual address space. when the child or parent makes a write operation on a page, then go ahead and copy page over. So, the disadvantage of copy-on-write is that we need extra functionality by the MMU. This is almost like mini page fault handling. This technique does achieve efficiency in memory handling though. PAGE REPLACEMENT: Find a not-used-so much page in mem and kick it out. Goal of any page replacement policy: Should lead to few page faults. Basic Idea of Page Replacement: Find location of desired page on disk. If there is a free frame, swap this page into the frame, else, prepare a page for eviction. Swap the page on disk into this newly freed frame. Update pts etc. Restart process that performed the invalid page access. When we evict, we need to write back. General benchmarking technique: Want to run it on a string of memory references and then compute the number of page faults Basically this is going be a bunch of numbers. Just run over them With Any Good Algorithm You Want: No. of pagefaults go down as we increase number of pages. If you look at the FIFO example on slide 41 in part10: An increase in the number of words increases the number of page faults if FIFO is used. Belady's Anomaly: In computer storage, Bélády's anomaly proves that it is possible to have more page faults when increasing the number of page frames while using FIFO page replacement algorithm. László Bélády demonstrated this in 1969. So, the FIFO is the worst possible you can perform. Now, you need a measure for the best. Optimal algorithm looks into the future and decides what to do The key here is to look into the future. The page is that is requested last in the future is the one you want. So, we have a measure of the worst and the best. So, the trick is to find something in the middle LRU: Least Recently Used Algorithm. This is not exactly kickA$$ since it only operates on data in the past and pretty much doesn't look ahead. So, any action this algorithm takes might result in a page-fault the very next instruction Various Implementations: Counter Implementation: maintain a counter for a page entry; every time a page is referenced, copy the value in the clock into the damned counter. When you need to evict a page, look at the value in the counter to pick the dude. Least counter value = what you want Stack Implementation: Push onto stack the most recent page accessed. If a page is referenced, you move it to the top. But this requires 6 pointers to be changed. Typically got to push the other dudes down if you pull outta the middle. The only added advantage is that no search for min counter. But we like traded a O(n) implementation for another. No benefit in a mathematician's eyes. LRU Approximation Algorithms: Reference Bit Implementation: Associate a bit with each page. This bit is initially set to 0. When referenced, set this bit to 1. So, you basically replace the page with ref bit set to 0. This is a bad implementation. In the real world it is hard to see anything set to 0. Second Chance Algorithm: Typically a FIFO + Ref Bit Implementation. When looking for a page to evict, look at them in the order they arrived. If their reference bit is set, set it back to 0. If it isn't pick it and throw it out. LFU: Least Frequently Used MFU: Most frequently used. This is used with the hope that the new pages brought in won't be used so much. The rest of the page replacement algorithms are crazy (the prof's words not mine). The solution I've come up with for The Dreaded Quiz: For the LRU, you need the largest number of consecutive 0s in the high order bits. For the LFU: The lowest number of 1s in the number. Need to perform a complete lookup of the history bits. Frame Allocation: Every process needs a minimum number of frames in memory. This is sort of architecture dependent, All pages for a particular instruction need to be in mem for the instruction to be executed. So, allocation methods: fixed allocation scheme: Equal Allocation: Each process gets the same number of frames. Proportional Allocation: Allocate frames to a process based on its size. Priority Allocation: This is a proportional allocation scheme that uses process priority rather than size. Now, on a pagefault, : -> for eviction, pick one of your own frames. -> for eviction, pick a frame from a process with a lower priority The Big Debate: Global Vs. Local Allocation: Global replacement: the process selects a replacement from the set of all frames, i.e one process can force eviction of a page that belongs to another process. Local Replacement: The victim you pick belongs to the set of frames assigned to this process. Tradeoffs of these techniques: -> Global replacement is more efficient overall. More commonly implemented. However, this can end up influencing pagefaults in other processes. -> Local replacement prevents external influences on the current process' page fault rate as long as the same number of frames is allocated but there may be free space in the system which doesn't get used i.e a page in another process that is not being used but doesn't get allocated to this process Thrashing: This provision introduces a swap in,swap-out level of intermediate CPU scheduling.Let take a example of a process that does not have enough number of frames.If the process does not have the number of frames it needs to support pages in active use, it will quickly page fault. it needs to evict another page in active use and so on. This leads to significant amount of page faults and there is very little CPU utilization in such a process. The OS then decides that there needs to be an increase in the amount of multiprogramming and as a result, it brings back extra processes that it swapped out. This increases the thrashing situation even further and so on. the main idea is that the idea behind virtual memory is that most of the memory accesses by processes are focused in a certain range of addresses in memory (i.e locality). When thrashing occurs, it basically means that the locality here is larger than the actual size of physical memory. Working Set Model: A working set at a given time is, working set window: A region we consider. This is typically a fixed number of memory accesses we look at to measure locality. Working Set: Total number of pages accessed in the most recent working set window. Need to pick working set window correctly, a very large working set window will get you several localities and very tiny working set won't get you the entire locality. Obtain the sum of all the Working Sets of all the processes running right now. If this sum > total number of frames, then thrashing is guaranteed. So, you need to suspend a process. Keeping track of the working set is a real pain because the working set window is a moving window and locality changes with time. A good way to do this would be with a timer and with a reference bit. If the reference bit is set to 1, the process is in the working set. A more accurate technique would be an n-bit structure, basically right shift the struct and copy the ref bit value over the the MSB Why is that? Basically, you have a n bit struct and interrupt interval = working_set_window / n The problem with the accuracy is that the working set window will change between [window_size, window_size + interval] time. Best trick is to probably use a larger number of bits. Other simple concepts in the slides. Too boring to be included here. FileSystems: A filesystem model has to offer the following and it works around the following: -> Byte oriented access as opposed to the block oriented access that we get. -> Naming of each file as opposed to the sector number that is used. -> Protection of one user's files from the other user's whereas on a disk plainly, there is no concept of permissions. -> A computer crash can corrupt the contents of blocks on a disk whereas in a filesystem, integrity has to be ensured irrespective of computer crashes and disk problems Now, file structures: Byte / Record Sequence: A file is stored as a sequence of bytes or record, the app determines the organization, no overall coherent meaning. Tree Structure: You have keys for records. This is for faster file access. However, where this paradigm fails is for storing images etc. Tree structure is only how the file is presented to the user. It does not have anything to do with how the actual file is stored on disk. File Access Methods: Sequential or Random. Sequential: Read bytes/records sequentially. Eg: mp3 files, and most media. Random: Read bytes / records in no particular order. Databases etc. If a file is stored sequentially, it can be a bit of a problem since the head needs to turn a lot more per request. In sequential, adjacent records should be on the same track. FILE ALLOCATION TECHNIQUES: Contiguous allocation: -> Every file is stored as a sequence of contiguous disk blocks. Advantages: Perfect for sequential access, allows random access too.random acc is easy since you can calc which block to access. Disadvantages: External Fragmentation since you might have smaller blocks lying around but you need to use the chunk that can fit this file in. Also, hard to use with files thar grow in size. Linked List Allocation: Every disk block points to the next block that belongs to the file. Advantages: Sequential access is just as good. Problem of fragmentation is eliminated. Disadvantages: Random access requires a O(n) walk through all the blocks. Also, o ne block in the middle gets corrupted, you're screwed. FAT (File Allocation Table): A table that is stored in the beginning of the partition. it holds a pointer for each block representing the next block in the file. So essentially the linked list is implemented here. Blocks need contain no pointers. This is ok. First, cache the FAT in memory and you are ok. and get decent performance for random access Indexed Allocation: inode: A structure that contains a table that represents the disk blocks that belong to a file. Only need the inode in memory. Stuff in an inode: -> File Attributes: file type, size, creator and group permissions, timestamps, reference count (no. of hardlinks to a file). -> Disk Block Addresses, (Direct and Indirect). So basically, the inode can point to the first initial file blocks. If there are more disk blocks than we can directly point to, we point to another disk block that contains more pointer to disk blocks that belong to the file, so this is 1 indirection, for the 2nd indirect, we point to a block that contains addresses to blocks which contain your data. Here is an image that describes the entire thing very well: http://www.cs.jhu.edu/~yairamir/cs418/os7/sld035.htm How to compute how much we can store: if 10 direct addresses can be stored, each block is 1 kb and each entry takes 4 bytes: The direct region in the inode allows for: 10 KB The single indirect assuming points to disk block of 1 KB, we can point to 1024/4 = 256 more, so 256KB extra. double indirect allows 1024/4 disk block addrs. each of these 256 blocks contain 256 pointers to disk blocks that belong to a file. so 256 * 256 * 1 Kilobytes extra. Keeping Track of Free Blocks: Free Block List: The free block list is a disk block that contains addresses of disk blocks that are free. The last entry in this block points to another block that continues the list of free blocks' addresses. Pros/Cons: ->Can grow extremely large (think 16 gigs free and now calc.) -> Wastes Free Blocks, the very things whose existence it supoosed to archive. -> We can only load one of these blocks in mem at a time. Free Bitmap: per every disk block, a 0 indicates not-free, 1 indicates free. Pros/Cons: -> Fixed in size (size of disk won't really change). -> Can beat free list in most cases except when there are fewer free blocks. -> the blocks allocated for a free bitmap are pretty much close together. Directories: They are also files, just special ones :). They typically store (file, inode number) pairs. i.e. a file name and the address of the file's inode. inode number: is an index into the systemwide inode table. "ROOT DIRECTORY HAS NO INODE. IT IS LOCATED IN THE BEGINNING OF THE DISK IN A WELL KNOWN LOCATION" Locating a file '/usr/bob/file': -> Get the root. Think of the FS as a tree and files as leaves. Get the inode for /usr from the root directory. -> Look up /usr -> get an inode for /usr/ (though this a dir, it is actually a file and has its own inode which is stored in the root dir. ). -> Now pull the /usr/ directory (file) from the disk block (stored in the inode number). -> /usr/ dir (file) contains dirname, inode no. pairs yet again. Pull the inode_no. for bob. -> pull bob/'s inode. There's going to be a file, inode_no. pair in the bob/ dir (file). -> Pull file's inode. You're done. HardLink: Each of the pairs in a directory is called a hard link. Reference Count: The unix command for a hardlink / softlink is "ln". You need to maintain a reference count as well. If mutliple files hardlink to the same inode, running "rm" on a file shouldn't clear it up. Only kick out the block if the ref count hits 0. Hardlinked files need to be stored in the same partition. The PCB contains all the info related such as an open file table and an inode table. An Open File Table, needs to keep track of all processes accessing a file. So, if one proc opens a file, increment ref count by 1. if a proc calls close(), decrement the ref count by 1. this is different from the inode reference count. This is system-wide so that if some proc tries to open a file that is currently opened, you can pull the block directly from there instead of looking up all the inodes. The open inodes that the open file table points to are in the open file table imp stuff in the open file table: - counter that keeps track of number of opens and closes for a file. - pointer to inode of the appropriate file. this pointer points to a location in the systemwide inode table. read() -> PCB -> OFT -> INODE -> Disk device driver -> create buffer cache in kernel mem -> copy over to buffer in user space -> return. buffer made in kernel mem so that if another proc requests the same buf, you can provide the same thing to that process. Also, if we pick the page containing the user buf, for eviction, then screwed. rest of filesystems stuff is present in the notes on the 23rd of April. Some other descriptions of screw ups in file consistency present on the slides. I/O : Communicate over a bus. A bus is a digital interconnection mechanism. Most buses come with a protocol defined. I/O Units: -> Mechanical component: the device. -> Electronic component: the controller. Device Controller: implements the disk side of the protocol, i.e. buffering, converting the serial bit stream into a block of bytes. Controller Components: Registers for data and control. Comm protocol between CPU and controllers via I/O instructions and registers. device controller: has its own instruction set. all of these are privileged instructions. 4 registers: -> Control - run command, change mode. -> Status - is the device ok, does it have a byte etc. -> Data-in - read by host -> Data-out - written to by host. Maskable Interrupts: Hardware interrupts which can be ignored by setting a but in a certain register. cpus stop detecting these during hte execution of a set critical instructions. Non-maskable interrupts: Hardware interrupts which just can't be ignored. includes non recoverable hardware errors. Typically interrupts come with an address that is an index into the interrupt vector. It is used to select an interrupt handler. Interrupt handlers for various devices are installed at boot time whe the OS probes the hardware buses. Programmed I/O: Good for byte-based devices. For block devices it is better to use DMA. Typical DMA Flow: -> CPU Executes DMA instructions to set up DMA transfer. -> DMA controller then gets a request which contains: (a). What the disk wants. (b). Name of the device. (c). Location of the desired stuff on the device. (d). length of data (e). whether reading or writing. (f). pass location in kernel memory. -> This request is entirely done in HW. -> Raise a HW interrupt when done servicing the request. Cycle Stealing: One complaint raised by the species that is never satisfied. Basically, the DMA is copying stuff from device to memory and this the cpu needs to fetch/execute/store from there. So overall, the CPU doesn't get the opportunity to do this. So, the CPU basically stalls. This is cycle stealing. To avoid this, adding an extra bus won't help. The MMU allows just one mem access at a time so no real use. The solution is the cache should contain items that are being currently requested by the CPU. If the cache is not a good approximation of the current locality, then the CPU is indeed going to be stalled. Overall, with cachine, DMA improves system performance.