Apache NiFi and Philter
Describes how to use Philter with Apache NiFi to filter sensitive information from text.
This article describes how Philter can be used with Apache NiFi to filter sensitive information such as PII and PHI within an Apache NiFi data flow.
Philter is available on the AWS, Azure, and Google Cloud marketplaces. So, fire up an instance of Philter and let's get started using it alongside your Apache NiFi data flow!
Philter is certified to work with Cloudera DataFlow (CDF) as a custom Apache NiFi processor. There are two options for deploying Philter with CDF.
In the first option, a custom NiFi processor performs redaction by communicating with an instance of Philter through Philter's API. The processor sends text to Philter for redaction and receives back the redacted text. This option requires deploying an instance of Philter alongside your Cloudera DataFlow installation. Next, get the Philter NiFi processor from GitHub. Deploy the NAR file to CDF and make it accessible to Apache NiFi.
Configure the Philter processor by specifying the location of Philter and any other necessary connection configuration, as shown in the image below.
Philter NiFi processor configuration
For a production environment, a cluster of Philter instances deployed behind a load balancer would provide improved performance and increased availability over a single instance.
The second option does not require an instance of Philter. Please contact us to receive a NiFi processor with all of Philter's capabilities embedded in it. This processor performs the text redaction entirely within your NiFi data flow with no external communication required. This processor is significantly more performant than the processor in the first option. When you receive the processor NAR file from us, deploy it to NiFi.
Configure the processor as shown in the image below by specifying the name of the desired filter profile and filtering context:
Philter NiFi embedded processor configuration
Both processor support the same transitions. The
redactedtransition contains the redacted version of the flow file's content. In the example flows shown below, the top flow uses the Philter processor utilizing Philter's API. The bottom flow uses the Philter embedded processor. As you can see, both flows are the same. The only differences are the middle processors and their individual configuration.