This is a typical case of the
maven-assembly plugin breaking things.
Different JARs (
DistributedFileSystem) each contain a different file called
org.apache.hadoop.fs.FileSystem in their
META-INFO/services directory. This file lists the canonical classnames of the filesystem implementations they want to declare (This is called a Service Provider Interface implemented via
When we use
maven-assembly-plugin, it merges all our JARs into one, and all
META-INFO/services/org.apache.hadoop.fs.FileSystem overwrite each-other. Only one of these files remains (the last one that was added). In this case, the
FileSystem list from
hadoop-commons overwrites the list from
DistributedFileSystem was no longer declared.
After loading the Hadoop configuration, but just before doing anything
FileSystem-related, we call this:
hadoopConfig.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName() ); hadoopConfig.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName() );
It has been brought to my attention by
krookedking that there is a configuration-based way to make the
maven-assembly use a merged version of all the
FileSystem services declarations, check out his answer below.