Big Endian Data

Big Endian Data A personal blog on big data software development. http://iandow.github.io How to Stop Hardcoding Service Endpoints in Vue.js One of the most common misunderstandings with Vue.js deals with how to define endpoints for backend services that are not resolvable during build time. In this post I’m going to describe how to define dynamic configurations like backend endpoints so they can be determined at runtime. Vue.js is a very... Mon, 13 Jan 2020 00:00:00 -0800 http://iandow.github.io/2020-01-13-Vue_for_S3/ http://iandow.github.io/2020-01-13-Vue_for_S3/ Running MediaInfo as an AWS Lambda Function This post describes how to package MediaInfo so it can be used in applications hosted by AWS Lambda. AWS Lambda is a cloud service from Amazon that lets you run code without the complexity of building and managing servers. MediaInfo is a very popular tool for people who do video... Tue, 10 Dec 2019 00:00:00 -0800 http://iandow.github.io/2019-12-10-MediaInfo_AWS_Lambda/ http://iandow.github.io/2019-12-10-MediaInfo_AWS_Lambda/ Deep Dive into CORS Configs on AWS S3 I originally published this article on the AWS blog, here: https://aws.amazon.com/blogs/media/deep-dive-into-cors-configs-on-aws-s3/ For several weeks I’ve been trying to diagnose Cross-Origin Resource Sharing (CORS) errors in a web component I built for uploading files to AWS S3. This has been one of the hardest software defects I’ve had to solve in... Fri, 04 Oct 2019 00:00:00 -0700 http://iandow.github.io/2019-10-04-Deepdive_S3_CORS/ http://iandow.github.io/2019-10-04-Deepdive_S3_CORS/ Running OpenCV as an AWS Lambda Function This post describes how to package the OpenCV python library so it can be used in applications that run in AWS Lambda. AWS Lambda is a Function-as-a-Service (FaaS) offering from Amazon that lets you run code without the complexity of building and maintaining the underlying infrastructure. OpenCV is one of... Mon, 15 Apr 2019 00:00:00 -0700 http://iandow.github.io/2019-04-15-OpenCV_AWS_Lambda/ http://iandow.github.io/2019-04-15-OpenCV_AWS_Lambda/ Data Management Strategies for Computer Vision Computer Vision (CV) developers often find the biggest barrier to success deals with data management and yet so much of what you’ll find about CV is about the algorithms, not the data. In this blog I’ll describe three seperate data management strategies I’ve used with applications that process images. Through... Thu, 25 Oct 2018 00:00:00 -0700 http://iandow.github.io/2018-10-25-Field_Studies_in_Computer_Vision/ http://iandow.github.io/2018-10-25-Field_Studies_in_Computer_Vision/ Business Innovation through Data Transformation Today I presented at the Seattle Technology Leadership Summit, which was a gathering of CxO’s and upper management from a variety of companies. I made the case that companies can become more competitive by innovating with data intensive applications and (secondarily) that MapR provides the best data platform to make... Thu, 07 Jun 2018 00:00:00 -0700 http://iandow.github.io/2018-06-07-Seattle_SIM/ http://iandow.github.io/2018-06-07-Seattle_SIM/ Using StreamSets and MapR together in Docker In this post I demonstrate how to integrate StreamSets with MapR in Docker. This is made possible by the MapR persistent application client container (PACC). The fact that any application can use MapR simply by mapping /opt/mapr through Docker volumes is really powerful! Installing the PACC is a piece of... Thu, 17 May 2018 00:00:00 -0700 http://iandow.github.io/2018-05-17-StreamSets_MapR_Docker/ http://iandow.github.io/2018-05-17-StreamSets_MapR_Docker/ Creating Data Pipelines for IoT with StreamSets If you think building data pipelines requires advanced software development skills, think again. A company called StreamSets has created software which enables you to build data pipelines using a drag-and-drop GUI. It frees you from the burden of writing code with the application programming interfaces (APIs) needed to ingest data,... Mon, 12 Mar 2018 00:00:00 -0700 http://iandow.github.io/2018-03-12-MQTT_StreamSets/ http://iandow.github.io/2018-03-12-MQTT_StreamSets/ The MapR-DB Connector for Apache Spark MapR just released Python and Java support for their MapR-DB connector for Spark. It also supports Scala, but Python and Java are new. I recorded a video to help them promote it, but I also learned a lot in the process, relating to how databases can be used in Spark.... Mon, 26 Feb 2018 00:00:00 -0800 http://iandow.github.io/2018-02-26-MaprDB_Connector_Spark/ http://iandow.github.io/2018-02-26-MaprDB_Connector_Spark/ Predicting Time-Series data from OpenTSDB with RNNs in Tensorflow I’ve been learning a lot of really interesting stuff about time-series data, lately. Over the past month I’ve learned how to consume Factory IoT sensor data from an MQTT server, process it in StreamSets, persist it in OpenTSDB, visualize it in Grafana, and forecast it with Tensorflow. It’s really amazing... Tue, 30 Jan 2018 00:00:00 -0800 http://iandow.github.io/2018-01-30-MQTT_RNN/ http://iandow.github.io/2018-01-30-MQTT_RNN/ Predicting Forest Fires with Spark Machine Learning Anytime you have lat / long coordinates, you have an opportunity to do data science with kmeans clustering and visualization on a map. This is a story about how I used geo data with kmeans clustering that relates to a topic which has effected me personally - wildfires! Every summer... Tue, 24 Oct 2017 00:00:00 -0700 http://iandow.github.io/2017-10-24-Forest_Fires/ http://iandow.github.io/2017-10-24-Forest_Fires/ Joining streams and NoSQL tables for Customer 360 analytics in Spark. “MapR-DB is the perfect database for Customer 360 applications”. That’s the tag line I used to describe a demo I created for MapR for the Strata Data Conference in New York in September of 2017. Describing Customer 360 as a use case for MapR-DB was the focus of this demo... Tue, 03 Oct 2017 00:00:00 -0700 http://iandow.github.io/2017-10-03-Customer360_Analytics_in_Spark/ http://iandow.github.io/2017-10-03-Customer360_Analytics_in_Spark/ Using Tensorflow on a Raspberry Pi in a Chicken Coop Ever since I first heard about Tensorflow and the promises of Deep Learning I’ve been anxious to give it a whirl. Tensorflow is a powerful and easy to use library for machine learning. It was open-sourced by Google in November 2015. In less than 2 years it has become one... Wed, 12 Jul 2017 00:00:00 -0700 http://iandow.github.io/2017-07-12-Tensor_Chicken/ http://iandow.github.io/2017-07-12-Tensor_Chicken/ How to plot data on maps in Jupyter using Matplotlib, Plotly, and Bokeh If you’re trying to plot geographical data on a map then you’ll need to select a plotting library that provides the features you want in your map. And if you haven’t plotted geo data before then you’ll probably find it helpful to see examples that show different ways to do... Tue, 27 Jun 2017 00:00:00 -0700 http://iandow.github.io/2017-06-27-Mapping_in_Jupyter/ http://iandow.github.io/2017-06-27-Mapping_in_Jupyter/ How to combine relational and NoSQL datasets with Apache Drill It is rarely the case that enterprise data science applications can operate on data which is entirely contained within a single database system. Take for instance a company which wants to build a Customer 360 application that uses data sources across its enterprise to develop marketing campaigns or recommendation engines... Mon, 01 May 2017 00:00:00 -0700 http://iandow.github.io/2017-05-01-Apache_Drill/ http://iandow.github.io/2017-05-01-Apache_Drill/ Visualizing K-Means Clusters in Jupyter Notebooks The information technology industry is in the middle of a powerful trend towards machine learning and artificial intelligence. These are difficult skills to master but if you embrace them and just do it, you’ll be making a very significant step towards advancing your career. As with any learning curve, it’s... Tue, 18 Apr 2017 00:00:00 -0700 http://iandow.github.io/2017-04-18-Jupyter_Customer360/ http://iandow.github.io/2017-04-18-Jupyter_Customer360/ How To Clone Virtual Machines in Azure I use Azure a lot to create virtual machines for demos and application prototypes. It often takes me a long time to setup these rigs, so once I finally get things the way I like them I really don’t want to duplicate that effort. Fortunately, Azure lets us clone VMs.... Mon, 17 Apr 2017 00:00:00 -0700 http://iandow.github.io/2017-04-17-How_To_Clone_VMs_Azure/ http://iandow.github.io/2017-04-17-How_To_Clone_VMs_Azure/ Kafka vs MapR Streams Benchmark A lot of people choose MapR as their core platform for processing and storing big data because of its advantages for speed and performance. MapR consistently performs faster than any other big data platform for all kinds of applications, including Hadoop, distributed file I/O, NoSQL data storage, and data streaming.... Mon, 20 Mar 2017 00:00:00 -0700 http://iandow.github.io/2017-03-20-Benchmarking_MapR_Streams_vs_Kafka/ http://iandow.github.io/2017-03-20-Benchmarking_MapR_Streams_vs_Kafka/ Automating MapR with MapR Stanzas In my life as a technical marketeer for MapR I have configured more clusters than you can shake a stick at. So, imagine my excitement when I heard that MapR installations can be automated with a new capability called, “MapR Stanzas”. MapR Stanzas allow you to automate the MapR installation... Mon, 13 Feb 2017 00:00:00 -0800 http://iandow.github.io/2017-02-13-MapR_Stanzas/ http://iandow.github.io/2017-02-13-MapR_Stanzas/ What's wrong with using small batch sizes in Kafka? What is Kafka’s batch size? Kafka producers will buffer unsent records for each partition. These buffers are of a size specified by the batch.size config. You can achieve higher throughput by increasing the batch size, but there is a trade-off between more batching and increased end-to-end latency. The larger your... Wed, 04 Jan 2017 00:00:00 -0800 http://iandow.github.io/2017-01-04-Kafka_Batch_Size/ http://iandow.github.io/2017-01-04-Kafka_Batch_Size/