What is Philter?

Answers some common questions about Philter.

What is Philter?

Philter is an application that finds and removes sensitive information, such as protected health information (PHI) and personally identifiable information (PII), from natural language text in text files or PDF documents. The types of sensitive information that can be identified by Philter is configurable to support custom types specific to your domain and use-case.
Given text as input, Philter applies a sequence of filters to the text to find and remove the desired sensitive information from the text. Philter then returns the filtered text. Philter was designed simplicity in mind to make Philter easy to integrate in existing systems.
Philter finds and removes sensitive information from text.
Philter is ideal for text processing pipelines in which sensitive information needs removed or redacted from text. Philter runs in your cloud and is available on the AWS, Azure, and GCP cloud marketplaces for easy deploy into virtual private clouds. Philter supports AWS GovCloud.

How does Philter work?

At a high level, when you send text to Philter, Philter looks for sensitive information in the text, manipulates the sensitive information based on how Philter is configured, and returns the filtered (redacted) text.

Where can I run Philter?

Anywhere! Philter is not constrained to any cloud provider or on-premises environment. You can run Philter in AWS, Azure, GCP, or any cloud provider. Or, you can run Philter in a Kubernetes cluster or on bare metal.

How do I send text to Philter?

You send text to Philter through its API. Philter's API has a method that accepts text as input and returns the filtered text. Explore Philter's API. All interactions with Philter are through its API.

What types of sensitive information does Philter support and can I customize the types?

The predefined types of sensitive information supported by Philter are PII and PHI identifiers like names, dates, email addresses, and social security numbers.
Yes, you can customize the types of sensitive information. For example, for a given use-case perhaps you are only interested in removing names and phone numbers and and are not worried about email addresses.
You can also customize the types of sensitive information by creating new types through custom patterns and dictionaries. Dictionary types can be "fuzzy" to allow for misspellings through user-configurable sensitivity levels.
These customizations are made in a file called a filter profile.

What are some other capabilities of Philter?

Philter offers additional features to give control over the filtering process.

Ignoring Specific Sensitive Information

You can provide a list of sensitive information values that should always be ignored. For example, the social security number 123-45-6789 appears in your text and is being removed. You know this is not a real social security number so you add it to the ignore list. Once added, Philter will no longer remove 123-45-6789 when seen in the input text.
For more information see Ignoring Sensitive Information.

Storing Replaced Sensitive Information Values

Philter can keep a history of the sensitive information it removes from text by storing those values in a data store. This can be useful if you would like to be able to in the future retrieve the sensitive information that was replaced or to reconstruct the original document.
For more information see Replacement Values Store.

Why would I use Philter instead of a list of regular expressions or other manual scripts?

Great question and a fair question. A list of regular expressions executed sequentially to find patterns in text is actually what led to the development of Philter. The list became long, convoluted with logic, and hard to manage. When the list grew it failed to scale to support multiple use-cases. More time was spent trying to manage the list than actually using it.
Philter solves these problems by providing a centralized means of defining and executing the filters. Filter profiles encapsulate the filters and the logic required to apply them. The filter profiles are modular and can be interchanged based on the input text.
Philter's API provides a standard interface for filtering text and is consumable by virtually any programming or scripting language making it easy to integrate Philter into any new or existing system.
Philter's capability to find persons names in the text use state-of-the-art natural language processing techniques and technologies. The models employed by Philter were trained on text from various domains to improve Philter's performance across many use-cases. Regular expressions can't do that.

What are the system requirements?

When launched from a cloud marketplace, Philter is pre-configured and contains all necessary dependencies. Philter requires the following:
  • 2 vCPU (e.g., m5.large instance type on AWS)
  • 8 GB of RAM
  • Open port 8080 for API requests (port can be changed in Philter's Settings)
  • Java 11
For improved performance, increase vCPUs to 4 and RAM to 16 GB.