Street#Grid 2007 Impressions
- newyorkscot
- Apr 19, 2007
- 5 min read
I attended the Street#Grid conference at the W Hotel in Union Square the other day -- this is the first NY version of the City#Grid event held at the end of last year in London. This was definitely a more niche event compared to most and was pleasantly low on the vendor-factor, with only some of the main players there: Datasynapse, Platform, Tangosol (now Oracle!), Gigaspaces as well as Microsoft's Cluster Server.
What I liked about this conference was that is was more focused on the experiences of some of the main investment banks who have been doing grid and data fabric work for a while. However, there was clearly a difference of opinion between the banks and the vendors in terms of what is important from a technology standpoint, and as Marc mentions there was definitely a sad view of where the actual application developers sit in terms of priorities. Wachovia, Bear Stearns and JPMorgan all mentioned that their top priority is in the manageability of their grids, and placed less importance in giving developers the tools to use the grids. At the same time (well actually, towards the end of the last session) Adrian Kunzel, Global Head of Architecture of Investment Bank Technology at JPMorgan, stated that (quote) "...our developers can't develop multi-threaded code" !! Which is it ? They can't develop applications for grid because you have not given them the tools, or you are not going to prioritize giving them tools because they can't develop multi-threaded code ?
There was a lot of chat about virtualization, provisioning, etc but ultimately what it all boils down to was getting better utilization out of the compute assets of the bank. Also, the validity of "outsourced CPU" should be questioned -- given that <70% utilization prevails in the banks, the demand has not yet met the supply internally, so what is really needed are better ways to utilize the grid and to "find new workloads" for the grid.
Bob Hintze, VP Utility Computing, Wachovia, gave a pretty good and practical presentation on his views of running a grid at the bank. I liked his views on making decisions quickly and basing the choice of vendor on actual 'use cases' that actually mean something to the business. Too many people try to make decisions on potentially building the "intergalactic, all vendor, all problem" solution. Bob also stated that he has less focus on SDKs, etc, and is more interested in (global) manageability and needs more functionality such as logging, etc. That said, he later made the point that we need to be able to easily provide grid environments for disaster recovery (DR), business continuity, development and testing alike. Re: DR, he would prefer better "high availability" than DR since they are inherently intermingled anyway. SOA Web Services is the core of what is using the grid at Wachovia.
Buzz Moschetti, Chief Architecture Office, Bear Stearns, also gave a decent outline of the challenges he is facing in the bank. Bear uses an internally developed grid for cash/fixed income and Datasynapse for Credit Derivatives and Calypso. One of the ironies of building the grid is that it can certainly add complexity to the infrastructure, and can cause a lot more issues and side-effects if portions of the grid get max-ed out in terms of utilization, or if they fail. Key thing to figure out include: a) capacity planning which requires a detailed view of business priorities and processes, b) inconsistent platform configuration -- you need to create a manifest, c) the challenges around versioning and incremental upgrading of software (and hardware) across grid nodes, d) Epic-scale policy design - reflecting Bobn Hintze's comments about "Keeping It Simple Stupid", there is no way to manage all aspects of all applications through policy while being able to be sensitive to changes in the environment: you are better off keeping a clean view on what's going on and go from there.
HP and IBM both pitched in on a few panels and seemed to be promoting open standards, better hardware acceleration (FPGA, etc) and being more agile & flexbile in managing the environments. They both pointed out that there are indeed issues with developers being able to adopt grid technology in their applications.
Adrian Kunzel gave a presentation in the afternoon which discuss the natural tension between the managers of datacenters and grids whereby the datacenter guys are always looking to optimize capacity and standardize/commoditize the hardware, while the grid guys are looking for additional workloads and machines to run them on. He then went on to discuss the how virtualization is "an end, not a means" and that the ultimate end-game of the grid world is to increase utilization. Some other interesting points that he made included:
Development & Testing: virtualization can give you a lot of immediate benefit since the provisioning of a new dev environment can be done easily and can be run on cheaper machines. This is especially good for self-contained applications, and you can blow away the environment if you screw it up. (Sidenote: We have just completed a project where there was a virtual QA environment and it really sucked because they only allocated a total of 500MB RAM to the entire environment)
Vertical Scaling: the only way to vertically scale applications is by cycle scavenging to reclaim headroom, so that option runs out of steam pretty quickly.
Horizontal Scaling: this clearly can add capacity, but the issue is the speed of provisioning is too slow and there is a lack of coherent monitoring. Provisioning technologies are trying to keep up, but falling behind.
Grids are pretty good for isolation and partitioning, providing controls and reasonable workload scheduling
Virtualization does not have that many tools and has no distribution mechanics.
Bottom line, the banks are constantly struggling to get better asset
and more compute capacity while trying to reduce costs. What the banks really need is a new approach to modeling OS interactions and resource consumption, while also developing the provisioning technologies that support distributed systems.
Other common points made during the day:
Manageability is most important to provide better tools for scheduling and allocation of tasks to certain portions of the grid, and to be able to understand the correlation of workload across disparate resources (by location, business, etc). There will always be hotspots, so better understanding utilization is key.
Open Standards -- almost everyone agreed that there needs to be better collaboration in the industry in creating standard APIs for accessing grids and for the vendors to provide a way to abstract the bank's infrastructure and applications away from specific grid implementations. Adrian Kunzel felt that we should be able to do this NOW and we need to bring bank's own experience to bear alongside the mainly academic contributions to date on grid. He also felt data fabrics / caching was about 3-5 years behind compute grid in this regard. Others agreed that there need to be more focus on solving business problems than IT issues. They also felt that the standardization of grid APIs would help people like Murex and Calypso as they already face the challenge of supporting multiple grid vendors' infrastructures. Additionally, the industry needs to challenge the virtual machine guys to create a standard format.
Comments