Consultor Eletrônico



Kbase 13810: HP/UX fs_async asynchronous writes , immediate reporting HP
Autor   Progress Software Corporation - Progress
Acesso   Público
Publicação   10/05/1998
HP/UX fs_async asynchronous writes , immediate reporting HP

FEB-11-1995

This notebook entry contains information from HP on the fs_async
kernel option and on immediate reporting.

Progress recommends against using fs_async because it makes the
overall file system more vulnerable. It is recommended that
customers *not* use asynchronous writing or immediate reporting
when tuning their file systems.

----------------------------------------------------------------


The fs_async kernel parameter only affects the file system
structure that is the inodes. When set (asynchronous mode),
internally, the bdwrite routine is called, that is the buffer
containing the inode is marked dirty. It will be written to disk
mainly when :

1 - the buffer fills up.
2 - sync() is called (every 30seconds) and the buffer has not
been accessed since the last sync() call.
3 - the buffer migrates to the front of the fee list and another
buffer is needed.

O_SYNC causes the data AND the inode information to written to
disk synchronously.

Turning on the fs_async flag makes the file system more vulnerable,
in the event of a crash, it increases the time to recover and often
makes some data/file not recoverable that would have been otherwise.
Because it does improve the file system performance, it is mostly
used during benchmarks.


Extract of "How HP-UX works: Concepts for a System Administrator"
-----------------------------------------------------------------

How the HFS File System Modifies Files

Every time a file is modified, the HP-UX operating system updates the
file system to ensure its consistency.

When a process updates (writes to) the file system, the data being
written is copied into an in-memory buffer cache. The physical disk
is updated asynchronously from the buffer write. The data, along
with the inode information reflecting the change, is written to the
disk sometime later, unless the file was opened in the synchronous
mode (see the description of O_SYNC and O_SYNCIO in open(2) and
fcntl(2) in the HP-UX Reference). The process continues, even
though the data has not yet been written to the disk. If the system
is halted without writing the buffer to disk, the file system on the
disk is left in an inconsistent state. Such inconsistencies are
flagged and corrected, if possible, by the fsck command at system
startup. (Discussions of fsck, the file-system check command,
appear later in this chapter, in Solving HP-UX Problems, and in
fsck(1M) of the HP-UX Reference Manual.)

The sync command can be used to force synchronization. However, the
syncer command routinely updates the file system's superblock,
inodes, data blocks, and cylinder group information, as described
below. (For further information, see sync(1M) and syncer(1M) in
HP-UX Reference Manual.)


Primary Superblock The superblock of a mounted file system is
written to the disk whenever a umount command
is issued, or when a sync command is issued and
the file system has been modified.

Inodes An inode contains information describing the
file. The inode is written to disk after every
modification, unless the fs_async parameter is
set in the configuration (S800 or dfile) file.
(See "Synchronous vs. Asynchronous Disk Writes"
later in this chapter.)

Data blocks In-core blocks (including directories, indirect
blocks, files, pipes, symbolic links, and FIFOs)
are written to the file system after being
modified and released by the operating system.
Upon release, data blocks are buffered or queued
for eventual writing. Physical I/O takes place
when the buffer is needed by HP-UX, when
a sync or fsync command is issued, or when
O_SYNC is set for the file. If a
file is opened with the O_SYNC or O_SYNCIO flag
set, the write system call does not return
until completed.

Cylinder group The cylinder group information is updated
whenever a sync is executed, or when the system
needs a buffer and the cylinder group is
written.

_________________________________________________________________

CAUTION
* Always unmount a file system before executing fsck.
* Always reboot the system without syncing (that is,
use reboot -n) after altering the root device with
fsck.

A file system can become inconsistent if you execute fsck on
a mounted file system other than the root file system; you
risk missing buffered information not yet written to the
file system. If this information is then flushed from the
buffer cache, it might overwrite corrections that fsck had
made.

__________________________________________________________________


Immediate Reporting

Numerous SCSI disk devices are shipped with a feature called
immediate reporting (disabled by default). Immediate reporting
speeds status notification; its implementation is handled by the
disk controller and disk device. However, immediate reporting
also has some associated risks.

With immediate reporting, when a device driver sends a write request
to a device, the device accepts the data, places it in its buffer or
its cache, and reports to the SPU that the write completed
successfully. Without immediate reporting, status is not returned
until the data goes to the media itself.

In a power (or other) failure, data might not have been written
successfully to disk, but in fact, still reside in a buffer. An
application, writing to the raw device or to the files system using
O_SYNC, continues processing as though the data has been written. If
data remains in the buffer at the time of a system failure, the
database is left in an inconsistent state.

Under rare circumstances, immediate reporting might also cause
delayed errors or system panics. This can occur in the following
scenario: A user has a write request and the system returns good
status immediately. If the next request is a kernel request and an
error occurs (such as a write failure) caused by the user's write
request, the error might get associated with the kernel request. If
the kernel request cannot tolerate the error, the kernel might panic.

Immediate reporting can be set or disabled using the scsictl(1M)
command. If it is critical that your system never go down, you
might want to disable immediate reporting. Although SCSI disks
available for Series 800 systems can be set for immediate reporting,
the feature poses greater risk of inconsistent data; the disks are
shipped with the featuredisabled.


Synchronous vs. Asynchronous Disk Writes

When HP-UX writes a file-system data structure to disk synchronously,
any file-system activity must complete to the disk before the program
is allowed to continue; the process does not regain control until
completion of the physical I/O (regardless of whether the I/O is
user data or operating-system data). Synchronous writes include some
file-system structures and whatever an application writes with
O_SYNC set.

When HP-UX writes to disk asychronously, I/O is scheduled at some
later time and the process regains control immediately, without
waiting.

By default, some critical changes to the structure of the file system
are posted to disk synchronously. Synchronous writes ensure file
system integrity in case of system crash, but this kind of disk
writing also impedes system performance. Run-time performance
increases significantly (up to roughly ten percent) on I/O-intensive
applications when all disk writes occur asynchronously; little
effect is seen for compute-bound processes. However, if a system
using asynchronous disk writes crashes, recovery might require
system-administrator intervention using fsck and might also cause
user data or directories to disappear.

As a system administrator, you can specify whether some disk writes
are performed synchronously or asynchronously. The fs_async
parameter in the S800 (Series 800) or dfile (Series 300/400/700)
enables and disables the feature regarding inodes. (You cannot
modify whether or not other types of disk writes occur synchronously.
They are asynchronous by default and synchronous if O_SYNC flag is
set by the application.)

* On the Series 300/400/800, the fs_async default value of 0
specifies that the writes should be performed synchronously.
Setting fs_async to 1 causes fewer writes to be performed
synchronously. Typically, this causes file-system performance
to improve.
* On Series 700 systems only, the default value of 0 specifies
that writes be performed asynchronously.

Note too, fs_async, deals with inodes and directories, while O_SYNC
deals with files and data. If a file is opened via O_SYNC, the file
continues to be written synchronously, regardless of what method is
specified. O_SYNC also causes inodes to be updated synchronously.

Although asynchronous disk writes increases system performance for
most applications, if a system crashes, file-system data structures
are likely to be left in an inconsistent state. For this reason, we
do not recommend that you turn on fs_async on a production system.

Normally, file-system recovery is performed automatically by fsck
in the reboot process and does not require any intervention by the
system administrator. However, using asynchronous disk writes might
require system administrator intervention in the event of a crash.
For further information, refer to fsck(1M) in the HP-UX Reference.


Progress Software Technical Support Note # 13810