S4 is a general-purpose, distributed, scalable, fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data.
S4 fills the gap between complex proprietary systems and batch-oriented open source computing platforms. We aim to develop a high performance computing platform that hides the complexity inherent in parallel processing system from the application programmer.
The core platform is written in Java. The implementation is modular and pluggable, and S4 applications can be easily and dynamically combined for creating more sophisticated stream processing systems.
S4 was initially released by Yahoo! Inc. in October 2010 and is an Apache Incubator project since September 2011. It is licensed under the Apache 2.0 license.
S4 has been deployed in production systems at Yahoo! to process thousands of search queries per second.
All nodes are symmetric with no centralized service and no single point of failure. This greatly simplifies deployments and cluster configuration changes.
Throughput increases linearly as additional nodes are added to the cluster. There is no predefined limit on the number of nodes that can be supported.
Applications can easily be written and deployed using a simple API. Building blocks of the platform (message queues and processors, serializer, checkpointing backend) can be replaced by custom implementations.
S4 hides all cluster management tasks using a communication layer built on top of ZooKeeper, a distributed, open-source coordination service for distributed applications.
When a server in the cluster fails, a stand-by server is automatically activated to take over the tasks. Checkpointing and recovery minimize state loss.
Apache S4 is an effort undergoing incubation at the Apache Software Foundation (ASF), sponsored by the Apache Incubator PMC. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.