A peek into the world of data strategy #5

This week's post includes articles from LinkedIn, Uber, Netflix, Google, AWS, Elastic and a blog topic on 'Data Gateways in the Cloud Native Era'

Jun 02, 2020

Who is this current post for:

Product Managers and Engineers passionate about building insights using data and scaling their analytics pipelines.

Love this blog? signup to receive weekly updates:

LinkedIn open-sourced its spark-inequality-impact - an Apache Spark library to measure and reduce inequality, or avoid unintended inequality consequences. They describe in the post on how they have been using measures like Atkinson index to evaluate product changes and explain more in depth on A/B inequality testing.

Article: https://engineering.linkedin.com/blog/2020/bringing-project-every-member-to-life

Uber writes about its Meta-Graph - a framework to assist in few-shot link prediction to train ML models to quickly adapt to new sparse graph data. Based on their research this new framework performs far better than Model-Agnostic Meta-Learning (MAML), a fine-tuned baseline using pre-train and finetune on the test graphs.

Article: https://eng.uber.com/meta-graph/

Netflix writes on how they have been able to enrich VPC Flow logs at hyper scale using Sqooby architecture to ingest hundreds of thousands VPC logs/hr thereby allowing them to provide visibility into their cloud ecosystem.

Article: https://netflixtechblog.com/hyper-scale-vpc-flow-logs-enrichment-to-provide-network-insight-e5f1db02910d

AWS announced its enterprise search service Kendra in general availability! The general availability release includes connectors for Salesforce, ServiceNow and Microsoft’s OneDrive cloud, along with improved vocabulary with domain-specific terms, faster indexing and accuracy, and newer scaling options.

Article: https://aws.amazon.com/about-aws/whats-new/2020/05/amazon-kendra-is-now-generally-available/

Elacticsearch 7.7.0 has been released which brings in major feature improvements like asynchronous search, password protected keystore, performance improvement on time sorted queries.

Article: https://www.infoq.com/news/2020/05/elasticsearch-7-7-released/

Bilgin Ibryam wrote an article on “Data Gateways in the Cloud Native Era” where Bilgin talks about how application architectures have evolved to include API Gateways for data layers to facilitate abstractions, security, scaling, federation, and contract-driven development features focusing on data aspect.

Article: https://www.infoq.com/articles/data-gateways-cloud-native/

Google open-sources Table Parser - a deep-learning system using tabular data to answer natural language questions. Currently trained on over 6.2 million tables from wikipedia results show it is matching or exceeding performance as compared to Microsoft’s Sequential Question Answering(SQA), Standford’s WikiTableQuestions (WTQ) and to Salesforce's WikiSQL.

Article: https://www.infoq.com/news/2020/05/google-natural-language-tables/

That’s all for this week. Your feedback is welcome!

Note: This blog series is for informational purposes only, and all views are my own and do not represent my employers.

Data Strategy Newsletter

Discussion about this post