Show HN: Open-Source Data Replication and Anonymization https://ift.tt/TwbR8Hp

Show HN: Open-Source Data Replication and Anonymization Hey HN, we're Evis and Nick, we're excited to be launching Neosync on HN! Neosync is an open source data replication and anonymization project that helps developers create safe, anonymized test data and sync it across all environments for high-quality local, stage and CI testing. This is how it works: 1. You select a job type. Today we support data sync jobs (these sync data between two databases and run on a schedule you define) and a data generation job (this generates synthetic data from scratch and sends it to a destination). 2. Next you define your source database and a destination(s) database (you can connect multiple destinations). 3. Next we pull in the schema from the source DB and then you can decide how to you to want transform your data. We ship with 40+ transformers (email, first name, address, random int64, random string, random float64, etc). You can create your own custom transformations as well. We've designed our transformers to be as flexible as possible so you can use them across almost any data type. You could also use Neosync in passthrough mode which means that none of the data will be transformed and you can use it for data replication. 4. Lastly, you can defined subsets. This is a way to filter the data that gets sent to the destination. You can provide a custom SQL query or filters to do this. For example, you can filter the data by an id, customerType, column, date, etc. This is very flexible. And that's it! The job will run on the schedule you determine. We handle things like retries and backoffs and referential integrity between tables. We also ship with APIs, a CLI and Github action so that you can use Neosync to hydrate a CI database in your CI pipeline. We're working on releasing a Terraform provider shortly. Deployment is pretty straightforward. You can deploy Neosync using Docker Compose (we provide a script) or on Kubernetes using our helm chart. So what's next? Here's a brief overview: Real time mode (hook up Neosync to Kafka/SQS and anonymize and send the data to destinations in real time) and more connections (MongoDB, Snowflake, CSV). On the ML side, supporting use-cases like consistent data generation (providing a seed value), statistically consistent data and more. You can check out our roadmap in our Github project. Here's a brief loom demo: https://ift.tt/s3aNeYF?... We'd love your feedback and contributions. We strongly believe that your data should be yours and it should stay on your infrastructure and open source is the best way to bring that vision to life. https://ift.tt/zvM3qr1 December 8, 2023 at 12:09AM

Comments