This project has retired. For details please refer to its Attic page.
DoFn (Apache Crunch 0.3.0-incubating API)

org.apache.crunch
Class DoFn<S,T>

java.lang.Object
  extended by org.apache.crunch.DoFn<S,T>
All Implemented Interfaces:
Serializable
Direct Known Subclasses:
Aggregate.TopKFn, CombineFn, FilterFn, JoinFn, MapFn, MapKeysFn, MapValuesFn, Protos.TextToProtoFn, Sample.SamplerFn

public abstract class DoFn<S,T>
extends Object
implements Serializable

Base class for all data processing functions in Crunch.

Note that all DoFn instances implement Serializable, and thus all of their non-transient member variables must implement Serializable as well. If your DoFn depends on non-serializable classes for data processing, they may be declared as transient and initialized in the DoFn's initialize method.

See Also:
Serialized Form

Constructor Summary
DoFn()
           
 
Method Summary
 void cleanup(Emitter<T> emitter)
          Called during the cleanup of the MapReduce job this DoFn is associated with.
 void configure(org.apache.hadoop.conf.Configuration conf)
          Called during the job planning phase.
 void initialize()
          Called during the setup of the MapReduce job this DoFn is associated with.
abstract  void process(S input, Emitter<T> emitter)
          Processes the records from a PCollection.
 float scaleFactor()
          Returns an estimate of how applying this function to a PCollection will cause it to change in side.
 void setConfigurationForTest(org.apache.hadoop.conf.Configuration conf)
          Sets a Configuration instance to be used during unit tests.
 void setContext(org.apache.hadoop.mapreduce.TaskInputOutputContext<?,?,?,?> context)
          Called during setup to pass the TaskInputOutputContext to this DoFn instance.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DoFn

public DoFn()
Method Detail

configure

public void configure(org.apache.hadoop.conf.Configuration conf)
Called during the job planning phase. Subclasses may override this method in order to modify the configuration of the Job that this DoFn instance belongs to.

Parameters:
conf - The Configuration instance for the Job.

process

public abstract void process(S input,
                             Emitter<T> emitter)
Processes the records from a PCollection.

Note: Crunch can reuse a single input record object whose content changes on each process(Object, Emitter) method call. This functionality is imposed by Hadoop's Reducer implementation: The framework will reuse the key and value objects that are passed into the reduce, therefore the application should clone the objects they want to keep a copy of.

Parameters:
input - The input record.
emitter - The emitter to send the output to

initialize

public void initialize()
Called during the setup of the MapReduce job this DoFn is associated with. Subclasses may override this method to do appropriate initialization.


cleanup

public void cleanup(Emitter<T> emitter)
Called during the cleanup of the MapReduce job this DoFn is associated with. Subclasses may override this method to do appropriate cleanup.

Parameters:
emitter - The emitter that was used for output

setContext

public void setContext(org.apache.hadoop.mapreduce.TaskInputOutputContext<?,?,?,?> context)
Called during setup to pass the TaskInputOutputContext to this DoFn instance.


setConfigurationForTest

public void setConfigurationForTest(org.apache.hadoop.conf.Configuration conf)
Sets a Configuration instance to be used during unit tests.

Parameters:
conf - The Configuration instance.

scaleFactor

public float scaleFactor()
Returns an estimate of how applying this function to a PCollection will cause it to change in side. The optimizer uses these estimates to decide where to break up dependent MR jobs into separate Map and Reduce phases in order to minimize I/O.

Subclasses of DoFn that will substantially alter the size of the resulting PCollection should override this method.



Copyright © 2012 The Apache Software Foundation. All Rights Reserved.