Saturday, December 20, 2014

Importance of Trash folder in HDFS

HDFS Trash Folder

The Hadoop trash feature helps prevent accidental deletion of files and directories. If trash is enabled and a file or directory is deleted using the Hadoop shell, the file is moved to the .Trash directory in the user's home directory instead of being deleted. Deleted files are initially moved to the Current sub-directory of the .Trash directory, and their original path is preserved. Files in .Trash are permanently removed after a user-configurable time interval. The interval setting also enables trash checkpointing, where the Current directory is periodically renamed using a timestamp. Files and directories in the trash can be restored simply by moving them to a location outside the .Trash directory.

Enabling and Disabling Trash

Go to the HDFS service.
Select Configuration > View and Edit.
Click the Gateway Default Group category.
Check or uncheck the Use Trash checkbox.
Click the Save Changes button.
Restart the HDFS service.
Setting the Trash Interval

Go to the HDFS service.
Select Configuration > View and Edit.
Click the NameNode Default Group category.
Specify the Filesystem Trash Interval property, which controls the number of minutes after which a trash checkpoint directory is deleted and the number of minutes between trash checkpoints. For example, to enable trash so that deleted files are deleted after 24 hours, set the value of theFilesystem Trash Interval property to 1440.

 Note: The trash interval is measured from the point at which the files are moved to trash, not from the last time the files were modified.

Click the Save Changes button.
Restart the HDFS service.

Source :Cloudera

1 comment: