Zoom in of PCB close to the SATA connector. The 5V and GND inputs are highlighted as well as the presumed 5V TVS and microfuses (0 Ohm resistors).

Fixing by breaking

Until last week, I had a ‘lab’ running 24/7 in the closet of my home office. I use it to periodically test new releases of virtualization and container software from the likes of Pivotal and VMware. However, a machine running 24/7 and using ~150 Watts when idling is a bit of a waste. The one thing preventing me from switching it off was a virtualized NAS (running a ZFS pool) which I also use to expose fileshares to the rest of the home. So that had to keep running.

I made a plan to move the NAS VM to another less energy hungry machine. That machine would be my existing desktop pc containing an Intel 4670K, which would when undervolted be an ideal host for the NAS and vCenter. So I started by replacing the CPU cooler with a Noctua to make it run silent (I accidentally ordered the massive 14cm instead of the 12cm one which almost didn’t fit over the RAM), and moved the drives over:

Build with the Noctua NH-D14

Build with the Noctua NH-D14

Time to turn it on: immediate power down again…something was wrong. I unplugged the 2 drives that were on the last cable I plugged and it would start. Plug them again and it would power off. It must be something was shorting in the HDDs, but how?

I use modular power supplies in all my builds as it keeps the build as clean as possible. So when moving the drives, I unplugged the SATA power cable and moved that over to the new case as well as they use the same connectors. Ideal right?

Well, there’s no discernable difference except the pinout is different…which is evil as the physical contract (connector) implies something about the electrical contract (the pinout): if it physically fits, electronically it should be fine as well. I measured the voltages and discovered 12V must have been applied to the 5V SATA input, and 5V to one of the grounds….oops. My immediate next step was to check with the ‘correct’ modular cable whether the drives would still spin up, but no luck. Something was fried.

Now I have backups of my really important documents in other places as well, but these two mirrored drives breaking at the same time was a nuisance. They kept things like my Steam library and most VMs for my lab, and it would be lots of work to get that back. I couldn’t stand it and started to investigate what happened, perhaps it was fixable after all.

The first thing I did was (after googling) unscrewing the PCB of one of these things, and turning it over:

PCB of a Seagate 2TB desktop drive. The SATA connector pins are visible on the bottom.

PCB of a Seagate 2TB desktop drive. The SATA connector pins are visible on the bottom.


So nothing was obviously wrong or burned here..which was good and bad. Good because not the whole PCB was fried, bad because I still didn’t know what was. The one thing I did notice is that this SMD stuff is INSANELY tiny, if it came to modifications I would never be able to do much. I hoped this wasn’t necessary.

My hope was quickly quenched as I searched for replacement PCBs online. It appears even if you find a replacement PCB of the exact version your drive needs (my two identical size drives from the same vendor series had wildly different PCBs), there’s a BIOS chip on each of these PCBs that’s unique to the drive, and you have to transplant that to the replacement PCB to get access to the drive again. I didn’t have the tools for doing it myself or the patience to ship my PCB and have all of that done.

More googling awaited, and I learned the inputs on these PCBs typically come with protective circuitry to prevent damage to the controller chips from applying wrong voltages and from Electro Static Discharge (ESD). These protections come in the shape of Transient Voltage Suppression (TVS) diodes. They are designed to safely short huge but brief voltage spikes (ESD) to ground and keep functioning. It turns out they are less capable when you apply constant overvoltages :(

When they blow up they stay in their short state (instead of open), creating a permanent short to ground preventing the drive from ever spinning up again (and making sure the power supply powers down immediately).

Zoom in of PCB close to the SATA connector. The 5V and GND inputs are highlighted as well as the presumed 5V TVS and microfuses (0 Ohm resistors).

Zoom in of PCB close to the SATA connector. The 5V and GND inputs are highlighted as well as the presumed 5V TVS and microfuses (0 Ohm resistors).


There should be 2 TVS diodes on most PCBs: one for the 5V and one for the 12V input. I searched for images and measured the resistance of both chips I recognized as similar to the ones I found online. One was short-circuited. I confirmed the traces on the PCB connected it to the 5V SATA input and ground which confirmed the hypothesis that 12V was accidentally applied to the 5V input. So this was likely the broken TVS. As everything is so small, not every chip is identifiable, so some educated guessing remains involved here…

So I clipped it, which was the most nerve wrecking thing I did last week.

By clipping it we break the short from 5V to GND. Also it removes all ESD protection for ever, so I better ground myself properly when hooking it up again. Which is what I did next.

And the drive still didn’t work…

This was caused by the second layer of protective circuitry: 0 Ohm resistors or ‘SMD fuses’. When too large currents are sent through these devices, they break which causes their resistance to go up from 0 to ~100 or so Ohms. As you can see in the image, the 2 highlighted ones were all that remained in the path from 5V to GND after the TVS shorted, causing them to fail as well. I confirmed the theory and measured a high resistance across them.

I tried to short them by applying a blob of solder. Did I mention everything is insanely tiny yet? After fumbling with my soldering iron for some time I gave up and examined the PCB some more. I found there was a pad on the PCB connected to the 5V input interrupted by 6 microfuses (2 of whom where broken), and soldered a wire from the 5V input to a pad on the PCB bypassing all of them. All safeties were gone now.

Zoom of the PCB without the clipped TVS, and with a wire bypassing the 0 Ohm resistors.

Zoom of the PCB without the clipped TVS, and with a wire bypassing the 0 Ohm resistors.

But it worked 😀

I quickly let my ZFS pool resilver with a new disk copying all the data and I was operational again some hours later.

So now I used my old desktop for this thing, I needed a new desktop. More on that later :)

CF-logo

CF Summit Europe 2017

The European edition of Cloud Foundry Summit is just around the corner.

Next week, an ITQ delegation will visit Basel Switzerland for the conference about the fastest-growing cloud native platform. After Berlin in 2015 and Frankfurt in 2016, Basel will for a couple of days be the venue where around 1000 Cloud Foundry developers, vendors, consultants and users meet to discuss the direction of the platform and its ecosystem.

Cloud Foundry has proven over the last few years that its not just another cool new technology. It is adopted as the platform of choice by organisations with a combined market cap of over 12 trillion $. Together with its vibrant community, this means CF is here to stay.

However, CF is not just about technology. In fact, it’s quite the opposite: the core idea is to help organizations enter into the digital age, and change the way in which they work and build their core products. CF is just the technology that makes it possible.

If the last few years were any indication, the keynotes will be all about organizations making fundamental transformative changes and becoming much more efficient by embracing Cloud Foundry. The remainder of the schedule is split into 6 tracks focussed around all the different facets:

CF tracks

From die hard technology to real world use cases – there’s something for everyone: the core project updates, extension projects and experiments tracks will have a strong technical focus aimed at the platform engineers and specialists. Cloud-native microservices won’t be any less technical, but rather focussed on how to develop software which runs well on CF, and which leverages all the goodness of the cloud. The Cloud Foundry at scale track focusses on the heavy users (IoT, service providers, multi-cloud). Finally the Cloud Foundry in the enterprise track focusses on real world use cases.

There is obviously a wealth of information in the sessions, but most importantly this is the place to meet and engage with the Cloud Foundry community at large.

Hope to see and meet you there!

PKS

Pivotal Container Service

This week at VMworld US, a couple of guys in suits and Sam Ramji VMware, Pivotal, and Google introduced Pivotal Container Service. Exciting stuff! But what does it mean?

Applications on modern platforms run in containers. And while containers themselves are increasingly standardized around the Open Container Initiative (OCI), there are huge differences in how platforms build and run these containerized workloads. This is a direct reflection of the intended platform use cases and corresponding design choices.

Cloud Foundry takes an application centric approach: developers push the source code of their app, and the platform will build a container and run it. As a dev you never have to deal with the creation and orchestration of the container – they are platform intrinsincs which can be tweaked by the Operations team. So in CF everything is focussed on developer productivity and DevOps enablement: an ideal platform for reliable and fast modern software development.

Other use cases do exist in which you do want to bring your own container (BYOC) e.g.: containerized legacy apps, applications already containerized by ISVs, stateful apps and databases, or cases in which dev teams already build containers as part of their build process. Although I would recommend those dev teams in the last example to check out Cloud Foundry, they are all valid use cases – use cases best served by a container centric platform.

Kubernetes is such a container centric platform. In fact, it’s the most mature and battletested platform out there, as the open source spin-off of Google’s internal container platform. However, it’s also notoriously hard to deploy and manage right. Google Cloud Platform introduced the managed Google Container Engine (GKE) to solve this problem in the public cloud.

Pivotal Container Service (PKS) is the answer for the private cloud. Pivotal solved the problem of deploying and managing distributed systems some years ago with BOSH – an infrastructure as code tool for deploying (day 1) and managing (day 2) distributed systems. Not coincidentally BOSH is the foundation and secret ingredient of Cloud Foundry.

PKS is Kubernetes on BOSH (Kubo), with tons of extras to make it enterprise friendly:

  • deep integration with VMware tooling on vSphere (vRealize Operations, Orchestrator, Automation)
  • integration with VMware NSX virtual networking
  • access to Google Cloud APIs from everywhere through a GCP Service Broker
  • production ready – enterprise scaled
  • supported

Be prepared for Q4 availability! In the mean time I can’t wait for beta access to test drive it myself.

MSFT_CF

.NET on Cloud Foundry

As a consultant on the Cloud Foundry platform I regularly get asked if CF can host .NET applications. The answer is yes. However, it depends on the application how much we as platform engineering have to do to make it possible. Chances are, you don’t have to do anything special. That chance is however quite low as I’ll explain below.

Note that I wrote a post on the same topic some 2 years ago. Now that Diego, .NET Core, and Concourse have all gained production status it’s time to see how the dust has settled.

The old and the new

The .NET Framework we have become used over the last 16 years or so is at version 4.6.x. It’s is essentially single platform (Windows), closed source, installed and upgraded as part of the OS, has a large footprint and is not especially fast.
Microsoft realized at some point this just wouldn’t do anymore in the modern cloud era in which frameworks are developed as open source, without explicit OS dependencies, and applications are typically deployed as a (containerized) set of lightweight services that are packaged together with their versioned dependencies (libraries and application runtime). Some time after, the world saw the first alphas and betas of .NET Core, and on June 27th 2016 it reached GA with version 1.0.0. This made lots of people very happy and was generally seen as a good move (albeit quite late).
While Microsoft is still actively developing both the legacy .NET Framework as well as .NET Core, it was made pretty clear .NET Core is the future.

So what about apps?

ASP.NET Core in a Nutshell

ASP.NET Core in a Nutshell


An application written as an (ASP).NET Core app will run on the old and the new – although sometimes it needs some community convincing to keep it that way. The opposite is not the case: many Windows specific/Win32 APIs are for obvious reasons not available on the cross-platform .NET Core runtime, so legacy .NET apps taking a dependency on these APIs can not be run on .NET Core without refactoring.
Note this dependency doesn’t have to be explicit: it’s about the whole dependency chain. For instance, the popular Entity Framework ORM library takes a dependency on ADO.NET which is highly dependent on Win32, and so can not be used. Instead, applications using it should be rewritten to use the new EF Core library.

New is easy – .NET Core

From a platform engineering perspective supporting .NET core is easy. As .NET core can run in a container on Linux, it follows the default hosting model of CF. So you just install/accept the dotnetcore buildpack and off you go.

Old is hard – .NET Framework

Of course you can attempt to convince your developers they have to port their code to .NET Core. However, your mileage may vary. Since legacy is what makes you money today, a large existing .NET codebase that’s the result of years of engineering can’t be expected to be rewritten overnight. And if rarely updated, it will be very hard to make a business case for it even if do you have the resources.
Instead, a more realistic scenario is a minimal refactoring in such a way that the vast majority of the never touched cold code can stay on .NET Framework, while all the new code together with often changed hot code can be written in .NET Core.

It needs Windows – Garden has your back

Cloud Foundry before 2015 used Warden containers, which took a hard dependency on Linux. The rewrite of the DEA component of Cloud Foundry in Go, resulting in DEA-Go Diego was covered quite a lot online. For .NET support, the accompanying rewrite of Warden in Go – resulting in Garden *badum tish* is much more interesting since Garden is a platform independent API for containerization. So what we need for a Windows Diego Cell is:

  • a Windows Garden backend – to make CF provision workloads on the VM
  • the BOSH agent for Windows – to manage the VM in the same way all of Cloud Foundry is managed on an infrastructure level

We need to package all this in a template VM stemcell so BOSH can use it. You can find the recipe for doing this, and some automation scripts here. Even with the scripts, it’s a lot of cumbersome, time consuming, and error prone work, so you best automate it. I’ll discuss in my next post how I did that using a pipeline in Concourse CI.

If you are on a large public cloud like Azure, GCP or AWS, and use Pivotal Cloud Foundry, Pivotal has supported stemcells ready for download. If you are on a private cloud, or not using PCF, you have to roll your own. I’m not sure why Pivotal doesn’t offer vSphere or OpenStack Windows stemcells, but I can imagine it has something to do legal (think Microsoft, licensing and redistribution).

Final steps

PCF Runtime for Windows

PCF Runtime for Windows


Once you have the stemcell you need to do a few things:

Again, if you are using PCF, Pivotal has you covered, and you can download and install the PCF Windows Runtime tile which takes care of both of the above. If you are on vanilla CF, you have to do some CLI magic yourselves.

Why you need a platform

Last month I visited Cloud Foundry Summit Europe. After last year’s last minute 6h drive to Berlin (due to fog our plane was cancelled)
CFSummitDrive2015
Frankfurt was much closer to home, and we had some time for pre-conference beers
CFSummitBeers2016

As the year before, I was captivated by the positive energy around this event. The community is breathing the ‘we know this is the next big thing’ atmosphere.

The quality of the talks was also great, ranging to cultural (diversity in IT) to hardcore tech (CF’s next networking stack) focus.

But the takeaway talk for me was the keynote by Daniel Jones aka ‘the moustache’: without talking about technology at all, he drives the point home about why you need a platform like Cloud Foundry.

Devops, Docker & All other IT buzzwords – what’s in it for me?

Earlier this year, together with Martijn Baecke and Jan-Willem Lammers (both VMware), I presented a fireside chat for the Dutch VMUG UserCon – the largest worldwide with attendance of 1000+.

The idea was to step beyond the hype and technology and see what new developments really have to offer for the business and IT.

We did this with a roleplay of the IT stakeholders of HypotheticalCorp: ‘dev’ (me), ‘ops’ (Martijn), and ‘business’ (Jan-Willem)
NLVMUG2016Talk

Jan-Willem freshly shaven and suited up after his ‘DevOps’ talk with beard in the morning:
NLVMUG2016Beards

Check out the talk below (in Dutch):

vRO API Explorer

It’s been 5 months since we released the first version of the vCenter Orchestrator API Explorer. Because people seem happy with it (we see a steady stream of feedback and returning users), we continue to improve the tool.

Over the last couple of weeks we’ve worked on:

  • performance improvements – it’s even faster now
  • move to .NET Core 1.0.0 (it was still running RC1)
  • documentation links in the vCenter API objects now link directly to the relevant VMware documentation

Please let us know if you are missing plugins, or would like certain features.

Container confusion

These days I’m working at a client creating workflows for their state of the art private cloud platform. It’s really quite nice: internal clients can use a webportal to request machines, which are then customized with Puppet runs and workflows to install additional software and perform custom tasks like registering the machine in a CMDB. All this is ideal for running legacy workloads like SQL databases.

Other offerings include ‘PaaS’ workloads for running business applications, e.g.: developers can make requests for ‘Scale Out’ application servers, meaning 2 linux VMs with Tomcat installed behind a loadbalancer.

The most popular offering by far is the large VM with a preinstalled Docker engine. In fact, they are so popular you might wonder why.

Is it because Developers have an intrinsic desire to create and run Docker containers? Naively, the current hype around containerization in general and Docker as a specific technology could indeed be explained as such.

However, if you know Developers a bit you know what they really want is to push their code into production every day.

To get to this ideal state, modern development teams adopt Agile, Scrum, and Continuous Delivery. Sadly, especially the latter usually fails to deliver to production in enterprise IT, giving rise to the waterscrumfall phenomenon: Continuous Delivery fails to break the massive ITIL wall constructed by IT Ops to make sure no changes come through and uptime is guaranteed.

So guess what’s happening when your Dev/Business teams request the largest possible deployment of a Docker blueprint?

Yep, you just created a massive hole in your precious wall. You’ll have your CMDB filled with ‘Docker machine’ entries, and have just lost all visibility of what really runs where on your infrastructure.

Docker in production is a problem masquerading as a solution.

Does this mean containers are evil? Not at all. Containers are the ideal shipping vehicles for code. You just don’t want anyone to schedule them manually and directly. In fact, you don’t even want to create or expose the raw containers, but rather keep them internal to your platform.

So how do you use the benefits of containers, stay in control of your infrastructure, and satisfy the needs of your Developers and Business all at the same time? With a real DevOps enablement platform: a platform which makes it clear who is responsible for what – Ops: platform availability, Dev: application availability – and which enables Developers to just push their code.