Migrating Legacy ECM Content to SharePoint

You may have noticed a distinct lack of posts on this site for some time now.  Well, the “issue” if that’s what you want to call it is the fact that my daily tasks seem to be taking me more and more into a SharePoint migration specialization.  By SharePoint migration, I am referring to the task of migrating millions of documents from legacy systems into SharePoint.

These days, we are finding that organizations are realizing that there is significant cost savings in leveraging SharePoint as their ECM system of choice, including for document and records management.  So while the rest of my friends at KnowledgeLake are standing up imaging systems (that use SharePoint for the document repository), I find myself performing large scale SharePoint farm architecture in prepration for gigantic document migrations.

Organizations are figuring out that they can migrate FileNet, Documentum, Stellant, Content Manager, and many other legacy DM systems to SharePoint.  Once they take the initial hit for the properly fortifying their SharePoint farm and executing the migration, they are saving big dollars by not having to pay the exorbitant maintenance costs that these legacy vendors impose.  They also gain economies of scale and operational efficiencies by standardizing on SharePoint as their one stop shop for ECM content.

So, long story short, over the last many years, my primary job has really been “Migration Man”.  Seriously, I’ve been doing so many migrations that I think I need to just show up to the office in a pair of tights and a cape with an “M” on it!  Sure I do a lot of SharePoint architecture and particularly a lot of storage architecture to hold that blast of new content.  But I’ve primarily been developing methodologies to blast SharePoint with millions of documents in a very short period of time.  For example, I recently migrated about 6.6 million documents into SharePoint in about 5 weeks.

For anyone interested, I recently posted a blog on the KnowledgeLake site covering some migration “best practices”.  Yes, I realize that “best practices” are largely subjective.  But whatever you want to call the guidance, it’s pretty good stuff.  I’ve been doing migrations for over 6 years.  So I’ve gotten pretty good at them and I know the factors that can make a migration successful as well as those that can make it go bust!



KnowledgeLake User Conference 2012

For a little warm up into the new SharePoint 2013 speaking season, I’ll be tackling a session at the KnowledgeLake User Conference 2012.  KL puts on this conference every year in the day or two just before the annual SharePoint Conference put on by Microsoft..

I’ll be presenting a session titled SharePoint 2013 Search Architecture.  The goal is to provide a 10,000 foot overview of the “What’s New” type stuff with SharePoint 2013 Enterprise Search with an eye towards how it impacts Enterprise Content Management.  Should be a good primer session to help folks decide which drill down sessions they want to dig into at SharePoint Conference 2012.  I won’t be presenting at the (US) SharePoint Conference this year (hint, hint) so I’ll get an opportunity to enjoy the rest of the conference and soak in a lot of great information with the other attendees the rest of the week!

My session is shaping up nicely and the rest of the week should be a great time!



Apparently I took a year off!

Well, I’ve neglected this blog long enough.

First, it was the holidays. Then there was a bunch of cool stuff I learned about SharePoint 2013 that I wasn’t allowed to post yet. Then I got crazy busy at work. Excuses, excuses.

So it’s time to inject some new life into this blog. Over the coming months, I will be posting on a few topics mostly related to Enterprise Content Management in SharePoint 2013. I have a major speaking engagement coming up early in 2013 (which I’ll be announcing in the next few weeks) so I have a lot to prepare. With SharePoint 2013 now RTM’d, the worldwide knowledgebase will soon be exploding with information on the new platform. I plan to ride the wave with everyone else!

It’s going to be a busy year next year with SharePoint 2013 launching. So far I’m committed to at least one very big speaking engagement, possibly a book deal, and recertifying my active status as a Microsoft Certified Solutions Master (MCSM) for SharePoint. I’ll probably also put out a post or two on this MCM replacement certification.

So… Off we go!



SharePoint Saturday Denver 2011

I just realized that I haven’t posted about the fact that I’ll be in Denver this weekend speaking at the SharePoint Saturday event. I’ll be delivering a reprise of “Scaling SharePoint Archives to Terabytes and Beyond”, similar content to what I delivered at the SharePoint Conference in Anaheim.

In my opinion, Denver puts on one of the best SharePoint Saturday events in the country! In addition to a powerhouse roundup of high quality speakers, sponsorship is top notch. This year we can expect a great SharePint event on Saturday night and then wait for it… A ShareSki event! I can’t wait to carve up some powder at Loveland on Sunday!

So I look forward to seeing all of you as well as some of my speaker buddies that I only manage to run into at these SharePoint conferences!

Kudos to Planet Technologies and any other conference organizers for what will no doubt be a fantastic SharePoint weekend!



Scaling SharePoint Records Centers #spc11 #spc382

I recently presented a session at SharePoint Conference 2011 titled “Scaling SharePoint Document and Records Centers to Terabytes and Beyond”.  I was quite happy with how the session played out and I wanted to thank those of you who attended for the many kind tweets regarding my session.

I had hoped that the “collateral” links inside of MySPC would allow us speaker types to post zip files with supporting content (scripts, code, etc) for use by attendees.  Unfortunately, the collateral links only allow slide decks and PDF files.

So… as promised, here are the related demo resources that many of you asked for:




SharePoint 2010 MCM

SharePoint 2010 MCM

Wow… It’s been a crazy summer.  The chaos started back in May when I began hardcore preparations for the SP2010 Upgrade rotation. This was followed by the actual rotation itself in early June.  Shout out to my U2 buddies out here!

So the rotation training was fantastic but the testing didn’t work out as well as I had hoped.  I made a critical mistake on my Qual Lab that cost me almost an hour to recover from.  It was enough to tank my effort.  Total bummer.  Kind of ruined my summer knowing I would have to go through the preparation process all over again.

After a nice summer vacation with the family plus a few weekend excursions (and several baseball tournaments) we’re now into late July.  Time to work through some more hard core Qual Lab preparation.  Good news, I passed on the second effort!  I also managed to get my other pre-requisite test out of the way.

All leading to the fact that as of September 21st, I’ve officially earned the SharePoint 2010 MCM certification!  Soooooo glad to have that off my plate before the SPC 2011 conference!

I added in some embarrassing detail here because I want folks to know that this just isn’t an easy process.  Yup.  I FAILed the qual lab the first time around.  I also missed the upgrade knowledge exam by a couple points the first time around because I spent all my study time preparing for the Qual Lab!  The fact is, I don’t have an opportunity to practice my Dev skills as frequently as I would like in my daily job.  So I have to study REALLY hard and practice executing configurations and deployments that I don’t regularly encounter to overcome that lack of regular practical experience.

It’s a funny thing really.  One of the instructors asked “How many DEV Pros do we have in the class?” followed by “How many ITPROs do we have in the class?”.  It was a fascinating question.  When I started my SharePoint career back in the early SPS 2003 days, I was definitely a DEV Pro.  But over the years my skills and experience have gradually shifted to the ITPRo side of the SharePoint house.  For the first time, I realized, that while I can still develop any component necessary to meet a customer requirement, I no longer consider it to be my strongest asset.  Just an interesting observation.

The difficulty of the SharePoint MCM certification is intentional and absolutely necessary.  It’s not impossible but it will always be REALLY hard.  Broad practical experience is absolutely required for success.  This accomplishes two things.  First, it ensures that those who acquire the certification really do know their stuff.  Second, it preserves the integrity of the certification for those who went through the process before us!

I also have to say that while my initial experience with the SP2007 MCM certification process was very positive a couple years ago, the program has made significant progress!  Brett Geoffroy has done a fantastic job moving this program forward.  I’m amazed at how smoothly the process flows given the logistical complexity that is SharePoint MCM.  Props to Brett, the MCM instructor team, Microsoft for the investment in time and money, and all of those behind the scenes folks that I’ve never met but who contribute quietly in the background to make this program the success that it is!

Also… I’m not sure if the names are public yet so I won’t mention them specifically.  But I also wanted to congratulate my SharePoint MCA friends who recently passed their boards.  Great job guys!

Anyway, it’s all over and I’m the proud holder of the SharePoint 2010 MCM certification.  I look forward to another couple of years of not having to go through this process…. until the NEXT upgrade cycle begins and I start all over again!



SharePoint 2010 Enterprise Content Management

I am proud to announce the release SharePoint 2010 Enterprise Content Management published by Wrox (Wiley)!

This book is the combined effort of 4 authors, myself included.  As my first venture into the world of book writing, I didn’t want to tackle an entire book.  As it turns out writing 4 chapters was quite a lot of work that represented an extensive time commitment.  I can only imagine what writing an entire book is like!  Kudos to those guys who can find time to write a whole book, get their 40 hours+ per week in and still remember the names of their wife and children!

So the book theme is obviously centered around how SharePoint 2010 can be leveraged as a powerful ECM platform.  The chapter list is included below.  If you’re curious, I highlighted the chapters that I wrote:

  • Chapter 1: What is Enterprise Content Management?
  • Chapter 2: The SharePoint 2010 Platform
  • Chapter 3: Document Management
  • Chapter 4: Workflow
  • Chapter 5: Collaboration
  • Chapter 6: Search
  • Chapter 7: Web Content Management
  • Chapter 8: Records Management
  • Chapter 9: Digital Asset Management
  • Chapter 10: Document Imaging
  • Chapter 11: Electronic Forms with InfoPath
  • Chapter 12: Scalable ECM Architecture
  • Chapter 13: ECM File Formats
  • Chapter 14: The SharePoint ECM Ecosystem
  • Chapter 15: Guidance for Successful ECM Projects

So if you happen to give it a read and you find the book useful, PLEASE go to amazon.com and let the world know what you thought of it!  We could use some positive reviews!  It would be bad form for us to review our own book so we need you!

I’m very proud of how the book turned out!




Speaking at SharePoint Conference 2011 in Anaheim, CA


I’m looking forward to October 3-6 in Anaheim, CA where I’ve been accepted to speak at SharePoint Conference 2011!  I’ll be presenting a session titled “Scaling SharePoint Records Centers to Terabytes and Beyond – Part 1”.

In this session, I’ll be taking you through architecture guidance for scaling record centers to incredible sizes from the ground up.  After a little bit of background on Records Centers and Document Centers and how they lend themselves to very large content archives, I will describe how they fit into an architecture that will scale to multiple terabytes.  I’ll be discussing architecture from the ground up starting with storage concepts with respect to new guidance from Microsoft, continuing with scalable taxonomy, and finishing up with tuning and monitoring.

Hope to see you at my session in Anaheim!



New Content Database and RBS Sizing Guidance

I was eagerly awaiting some new content database and RBS storage guidance that I had heard was coming from Microsoft.  Of course, they managed to release the new info when I was out on vacation.  Then I had planned on transferring my blog to a new hosting provider and didn’t want to put out any new posts using the old system.

So yes, this post is a little late and I’m sure half of the planet has already blogged about this, but since storage and RBS are something that I’ve talked a lot about, I feel like I need to add my 2 cents to this one.  What am I talking about?  I’m glad you asked…

The Microsoft SharePoint Team blog has published some new guidance regarding supported content database sizes.  The provide a summary of the changes and some nice background info in this blog post where they did a VERY good job highlighting the changes and pointing you to the new supported limit statements referenced in TechNet.  So in the sections below, I want to just mention a couple of things related to the new guidance.

New RBS Storage Clarification

According to the new guidance, the storage consumed in a particular BLOB store by a related content database must be included in the overall content database storage size when considering content database size supported limits.  I have to come clean RIGHT NOW and say that I’ve been a proponent of using RBS to skirt content database size limits in order to facilitate a more flexible taxonomy for large scale content databases.  Unfortunately, I heard this concept long ago and latched onto it as a solution for an issue that I regularly encountered.  So I’ve been shouting this misinformation from the rooftops for a long time.  Turns out that TechNet never explicitly said we could ignore RBS BLOB store storage requirements in the content database size limit.  So for the record, please include BLOB store storage sizing in your content database size number.  I’ll be passing this along in future speaking engagements as well.

So where does that put us?  Are we hosed if we deployed RBS?  In many cases, the answer is no.  I found that we typically wanted to use RBS for large scale document archive site collections based on either the Document Center or Records Center site template.  Since the new supported limits allow for larger content databases, in many cases we’re still within the limits.  If you’re not inside the limits but your system is functioning normally, Microsoft has given us the option of opening a paid supportability ticket so that their support team can “certify” systems that are beyond the limits.

Is RBS Still a Useful Technology?

Yep.  But the new guidance ensures that it won’t be overused and abused.  Ok, so we’ve got 1 less use case.  But RBS is still very beneficial in several solutions:

  • Content Addressable Storage (CAS) Solutions – If your organization operates in a heavily regulated industry, it’s possible that you’re not allowed to delete documents.  Write Once Read Many (WORM) mode CAS storage devices such as Hitachi HCAP and EMC Centera are excellent solutions for ensuring that binaries live forever.  An RBS provider can take advantage of these CAS storage devices to ensure that binaries are never deleted.
  • Digital Asset Management Solutions – If you need to deploy a DAM site collection that will be used to host 700mb training videos for your organization, then RBS may again be a good fit.  Nobody wants to shove 700mb files into a content database.  But with RBS enabled and our new 4TB storage limit in place, it’s possible to store over 5,000 of those 700mb training videos OUTSIDE of SQL Server (using RBS) instead of less than 300 with the old 200GB limit.  Sure, it’s an edge case, but there are many other similar solutions that might involve CAD files, large graphic assets, or huge 1,000 page PDF reports that need to be archived.
  • Compression, De-duplication and Encryption Solutions – Looking for an extra layer of binary security for your files?  RBS can encrypt binary streams on the way to the BLOB store.  How about saving on some storage cost by enabling compression?  RBS can help you there too.  Also, a real sophisticated RBS provider can employ a de-duplication engine in the BLOB store to ensure that a given document is only stored once on the file system.

These are just a few reasons why RBS is still an important technology.  But it’s important not to go to far into the other ditch and try to use RBS on every large scale solution.  If the requirements don’t dictate that RBS is necessary, then it is important to leave it on the shelf to avoid the additional complexity of backup/restore and upgrade.

General Usage Content Databases

For most content databases, Microsoft still wants us to hang out under that 200GB number that we all know and love.  Essentially, if you don’t need to push the boundary, then don’t!  By staying under 200GB you ensure that you’re always supported and you probably won’t have to deal with any of the performance optimizations that Microsoft requires in order to support larger content databases.  This is the 80% bracket that most solutions should fall into.  Performance, usability, database maintenance, backup/restore, and upgrade will all benefit if try to architect your content databases to stay under 200GB

4TB is the new 200GB

Wow 4TB.  That’s a fun number.  Makes you want to just go out and re-architect your whole SharePoint environment doesn’t it!  Um, please don’t.  There are reasons to push a content database to 4TB but there is also a mountain of additional requirements that need to be addressed before you can even think about a number like 4TB.

Managing Very Large Content Databases

They went the extra mile.  There are a few people that hold a wealth of knowledge regarding the planning, monitoring and maintenance of super gigantic content databases.  Bill Baer is one of those guys.  If you are entertaining the possibility of taking advantage of some of these new limits (or if you’re already there!), then you need to understand this whitepaper inside and out.  All of the juicy “how to” goodness is in there.

No Limits for Document Archive?

Personally speaking, this is really cool.  Professionally speaking, this scares the daylights out of me!  I’m guessing that about 18 months from now someone is going to ring up KnowledgeLake and ask for that MCM guy they’ve got who’s real good with storage and performance optimization.  It will take me about 30 seconds to look at their SQL Server and see that someone has jammed 5TB of content into a content database that is in no way optimized for it.  People, if you want to go larger than even 1TB you better have some ridiculously spec’d out storage!  Can it be done?  I think so.  Should it be done?  Depends on how big of a check you’re willing to write to make it happen.

In Conclusion…

I have to say that I’m really glad that Microsoft finally bit the bullet and gave us something we can work with.  For those of us who are willing to take the time and money to properly architect a large scale storage solution, this guidance really opens up a lot of doors.  But at the same time, I expect it to cause some issues as well.  Inevitably, systems will be targeted at these new numbers by people who don’t want to take the time to read ALL of the supporting guidance that enables these new boundaries.  Still, the guidance was sorely needed and at least now we have a stronger storage foundation to build upon.



I Haven’t Forgotten About You!

Blog Reader:  Hey Russ… What gives?  You haven’t written any blog entries lately?

Russ:  Yeah man.  I’ve been slammed with FAST Search for SharePoint 2010 install / configurations, baseball games for my boys, prepping for speaking engagements, and most of all, studying for my MCM 2010 upgrade which is coming up in June.

Blog Reader:  Well, I guess that’s OK.  But other than that MCM 2010 thing which is probably pretty rough, your other excuses are kind of weak.

Russ:  Yeah.  I know.  I promise I’ll get back in gear in July after I’ve had time to decompress from the MCM 2010 upgrade session in Redmond.

Yep.  That’s right folks.  I’ve been slackin’ on the blog thing.  Just not enough hours in the day to put the required thought into the posts that I want to do.  But that will change come July.  I’m planning another blog software refresh.  Something a bit more modern looking than what I’ve got going on right now.

I’m also planning a few blog posts on the topic of SharePoint 2010 storage best practices as well as at least one post (maybe a series) on Deploying FAST Search for SP2010.  The TechNet guidance isn’t real clear on the considerations that drive the all important deployment.xml file which basically controls the FAST farm topology.

Soooo… Hang with me just a bit longer.  I’ll be back soon…


Older posts «