Saturday 24 September 2011



SharePoint 2010 Remote BLOB Storage (RBS)


Contents


 

 

 



LOCAL and REMOTE FILESTREAM Providers


There are two possible implementations of the FILESTREAM provider.  They are the Local FILESTREAM provider and the Remote FILESTREAM provider.   The FILESTREAM provider implements a database FILESTREAM file group in order to essentially turn the SQL Server NTFS file system into a BLOB store.
When deploying the Local FILESTREAM provider the FILESTREAM file group is created directly in the database that is being RBS enabled.  This means that the same SQL Server that is processing requests from the client application database is also acting as a BLOB store (see Figure 1-3).
Figure 1-3

When deploying the Remote FILESTREAM provider, the FILESTREAM file group can be created in a database that is NOT the content database being RBS enabled (see Figure 1-4).  In fact, the FILESTREAM file group can even be created in a database that is hosted on an entirely DIFFERENT (or REMOTE) SQL Server (see Figure 1-5).  This means that a second SQL Server can be deployed and dedicated to servicing RBS BLOB store requests, allowing the SQL Server that hosts application databases to be dedicated to application processing.  In extreme scale environments this can result in a significant improvement to application scalability.


EBS

EBS (External BLOB Storage API) is introduced by SharePoint 2007 SP1. It is still supported in SP2010. Moving forward, in order to have a more unified storage solution, SP2010 will likely only support RBS in the next release.

RBS and EBS


RBS is implemented by SQL Server (only SQL Server 2008 and later); it is nothing to do with SharePoint directly. When you enable RBS, all BLOB streams that SQL Server would normally be compelled to store internally are spewed forth to the file system.
RBS gives you a way to store the blobs outside of SQL server to a storage solution of your choice

A client application, such as Microsoft SharePoint 2010, can take advantage of SQL Server 2008 R2 and Remote BLOB Storage to store BLOB data

SQL Server 2008 R2 Remote BLOB Store (RBS) is an add-on feature pack for Microsoft SQL Server 2008 R2.  RBS is designed to move the storage of binary large objects (BLOBs) from database servers to commodity storage solutions. With RBS, BLOB data is stored in storage solutions such as Content Addressable Stores (CAS), commodity hardware storage systems attached to the Databases and File systems or cloud storage solutions like Microsoft Windows Azure.

FILESTREAM and RBS


FILESTREAM and RBS are complementary approaches for storage and management of BLOBs.

FILESTREAM provides a storage option that allows storage, efficient streaming and integrated management of large BLOBs in a SQL database by utilizing the underlying NTFS file system for BLOB storage/streaming. It offers fully transactional access and compatible operations as varbinary(max). The SQL FILESTREAM feature can only use directly attached storage

RBS is a set of standardized APIs that allow storage/retrieval of BLOBs outside of your main SQL database. This uses a provider model for plugging in any dedicated BLOB store that implements these RBS APIs.
There is an out-of-the-box RBS FILESTREAM provider that allows a deployment to use a SQL Database (local or remote) as a dedicated BLOB store. This provider utilizes the FILESTREAM as the BLOB storage mechanism and ties the two technologies together.


EBS is implemented by MOSS 2007 (available as a hot fix to MOSS 2007 SP1 and later). The EBS provider lives at the very bottom of the SharePoint stack, just above the interface in to SQL Server. Just before the BLOB is passed to SQL server the EBS provider gives your process the opportunity to optionally take ownership of the BLOB.  You give SharePoint a token in exchange so it knows how to get the object back from you at a later date.  
EBS is the only supported option on SharePoint 2007 - and although it is deprecated in 2010 it is still supported. 
Scenario - start with EBS on SharePoint 2007, upgrade to SharePoint 2010, and then migrate from EBS to RBS.  Upgrading from 2007 to 2010 with the BLOBs remoted is possible
NOTE - there is no OOB capability delivered by Microsoft.  An EBS Provider is required, whether we build one or obtain it from a 3rd party

Architecture of External BLOB Storage

Before the introduction of the external binary large object (BLOB) store provider (EBS Provider), the semantics of BLOB storage routed the binary data stream associated with a SharePoint file to the Microsoft SQL Server content database, which it shared with the site's structured data. Under that scenario, when you invoked a Save command on the SharePoint file, a parser in the Save path recognized the Save command and promoted a parcel of metadata out of the file stream. Then the metadata, along with the BLOB associated with the file, was stored in the SQL Server content database.
After installing, configuring and enabling the EBS Provider, the semantics change considerably. Now, instead of the Web application, the middle-tier storage access stack routes BLOB data streams and uses the EBS Provider to store BLOB data in the external BLOB store, and then returns metadata that allows it to retrieve the BLOB on demand. Importantly, the SharePoint object model is completely insulated from semantics of the EBS Provider, as well as the existence of an external BLOB store. This separation ensures that existing applications and services are fully agnostic to storage implementations. Only the storage access stack is aware of the existence and the semantics of the external BLOB store.



The EBS Provider is your custom implementation of the provider interface - the unmanaged ISPExternalBinaryProvider, and is integrated into the storage access stack as a COM component. If using native EBS then we must build the COM server ourselves and implement the ISPExternalBinaryProvider interface

Alternatives
·         Use AvePoint’s FREE DocAve Extender
·         Use StoragePoint (MetaLogix)
·         Quest


RBS vs EBS – Pros and Cons


Remote BLOB Storage (RBS) – SQL 2008 only

Pros:

·         RBS is implemented in SQL Server and is application agnostic. That’s to say, if you turn RBS on then all BLOB objects from any SQL Server-based application will be externalized. If that’s what you want to happen then that’s great but if you need to be able to apply business logic to what is externalized and whence it goes then you are severely restricted.
·         It is simple – you turn RBS on and the content is simply stored on to the local file system. If you have some kind of file system virtualization software in place then you can do some basic management tasks but only based on the file system attributes of the object.
·         If you want access to the context and metadata of the object then you are going to have to dip in to SQL Server and start hunting down SharePoint based reference information; Microsoft do not recommend this - in fact they do not publicly publish the DB schema for SharePoint so it would be potentially dangerous.
·         The current thinking is that RBS might have more longevity than EBS. It is likely that EBS will fade out of the stack over time – obviously this is not 100% certain but likely.

  Cons:

·         Getting the content out of SQL Server only solves 5% of the real issues according to 9 of my 10 personalities. Seriously, getting the BLOBs out of SQL Server gives you scalability but it does not deliver any of the IT efficiencies, compliance overlays, or re-purpose/re-use benefits of managing the externalized content.
·         Intelligent archiving is the key to getting this right. You need to have the BLOB, the metadata, the context and the ability to manage the object – no less than this. The RBS model only provides the BLOB – no context and no ability to manage the object.
·         No business rule mapping…RBS is all or nothing – you get all BLOBs all of the time. EBS is not much better but does support certain rules. For example, in theory you could configure EBS to not externalize content from certain sites or content less than 50KB in size.
·         Needs SQL Server 2008 – not a huge deal but a consideration.

  External BLOB Storage (EBS)

Pros:

·         Implementing EBS in SharePoint:
o   allows file metadata to be stored in a Content Database, and
o   the actual file contents to be stored in an external file system
·         EBS is provided by the SharePoint team and although it is lacking in some areas it does understand the context of the BLOB that it exposes. In other words, we do know what the BLOB object is and we can track changes/deletes on the object.
·         The architecture allows is to provide an intelligent process for capturing the BLOB and just as importantly for returning the BLOB on demand, (i.e. when you want to view it from SharePoint).
·         Because we are interacting directly with the SharePoint processes we can perform more intelligent operations. For example, if the BLOB was deleted (with good reason) from the store then we could cascade that delete back up to SharePoint. Same with changes to the object or its status.
·         It does not require SQL Server 2008

Cons:

·         When an edited or updated SharePoint file is saved, the EBS Provider does not overwrite the existing BLOB. Instead, the EBS Provider creates a new BLOB stream and places it in the external BLOB store. Metadata references to the original BLOB and its Blob_Id are deleted from the SQL Server database; then, the EBS Provider creates a new reference (with a new Blob_Id) to associate the newly created BLOB.
·         This scheme results in the accumulation of orphaned BLOBs in the external BLOB store; depending on the rate of updates to BLOB data in your system, the number of orphan BLOBs can easily become quite large. For this reason, you must analyse and quantify your rate of update and implement garbage collection on a schedule that is sufficient to maintain your external BLOB store efficiently.
o   Garbage collection itself is a resource-intensive operation, both for the external BLOB store and for the SQL Server content database, so you must ensure that you schedule garbage collection in a way that does not compete for application resources during periods of peak usage
·         EBS is deprecated in SharePoint 2010 - DEPRECATED does not equal UNSUPPORTED
·         In its current version, external BLOB storage is supported only at the scope of the farm (SPFarm). Furthermore, you must install, configure, and enable the EBS Provider on each front-end Web server in the farm.
·         Any error conditions, resource drag, or system latency that is introduced by using the EBS Provider, or in the external BLOB store itself, are reflected in the performance of the SharePoint site generally.
·         No utility is available for moving BLOB data from the content database into the external BLOB store. Therefore, when you install and enable the EBS Provider for the first time, you must manually move existing BLOBs that are currently stored in the content database to your external BLOB store. You can do this in two ways.
·         You can use site-level backup and restore methods on the SPSiteCollection object to backup and restore your entire data store. After first backing up all of your data, and then installing and enabling the EBS Provider, when you Then when you invoke the restore operation, the EBS Provider routes BLOB streams to the external BLOB store, while routing structured data to the appropriate tables in the content database.
·         A second, passive method relies on the issue discussed earlier in this section. Because new BLOB data streams are saved to the external BLOB store whenever a file is updated, you can simply install and enable the EBS Provider and then allow the natural process of file updates to gradually populate the external BLOB store with binary files that were previously held in the content database. When an updated file is routed to the external BLOB store, the previous version of the file is automatically deleted from the content database.
·         When using an external BLOB store with the EBS Provider, you must re-engineer your backup and restore procedures, as well as your provisions for disaster recovery, because some backup and restore functions operate on the content database but not on the external BLOB store. You must handle the external BLOB store separately

Configurations


Supported
Not supported

 

·         MOSS + EBS + SQL 2005
·         MOSS + EBS + SQL 2008
o   Need to write custom code that leverages Microsoft’s APIs to create own BLOB management functionality
o   Buy third party tool
o   Use AvePoint’s FREE DocAve Extender
·         MOSS + RBS + SQL 2005 = NO
·         MOSS + RBS + SQL 2008
o   Can make SharePoint 2007 work with SQL 2008 RBS. Have to change the Content column on the AllDocStreams and AllDocVersions tables from Image to  VarBinary(max) - effectively making the implementation unsupported by Microsoft, but it will work

·         SP2010 + EBS + SQL 2005
·         SP2010 + EBS + SQL 2008
o   EBS is deprecated in SharePoint 2010 - DEPRECATED does not equal UNSUPPORTED
·         SP2010 + RBS + SQL 2005 = NO
·         SP2010 + RBS + SQL 2008
o   Optimal configuration



Backups and High Availability

By using SharePoint Backup and Restore, it doesn’t really matter where the BLOB objects are stored – this is taken care by SQL RBS. So when you use SharePoint Backup on a RBS enabled content database, it will still backup everything. You can restore the backup to a non-RBS enabled database too, the BLOB objects will be inside the database – SQL RBS will decide where to put them, not SharePoint.
If you are trying to backup the database by using SQL Backup, keep in mind this will not backup the BLOB objects. You need to manually backup the database first, then manually backup (XCOPY or other file backup tools) these BLOBs. When restoring, use the reverse order to restore BLOBs first, then the databases.
FILESTREAM maintains transactional consistency between the structured and unstructured data at all times, even allowing point-in-time recovery of FILESTREAM data using log backups.
Consistency is maintained automatically by SQL Server and does not require any custom logic in the application.
The FILESTREAM mechanism does this by maintaining the equivalent of a database transaction log, which has many of the same management requirements. The combination of the database’s transaction log along with the FILESTREAM transaction log allows the FILESTREAM and structured data to be (transactionally) recovered correctly.

 

High Availability Considerations


·         SQL Server Backup/Restore.  SQL Server backup and restore operations are supported for only the BLOB references in the database.  BLOB data in the BLOB store does not get backed up.  A separate solution, implemented at the scope of the BLOB store, must be implemented to ensure the availability of BLOB data.
·         Log Shipping.  Log shipping is directly supported with the FILESTREAM provider.  Other RBS providers must specifically implement support for log shipping or the feature will not be available.
·         Clustering.  Clustering is supported by FILESTREAM provider because the FILESTREAM file group will be mounted locally for the cluster servers.  The ancillary tables also support clustering so RBS providers other than FILESTREAM will work with a SQL cluster.  However, it is important to note that a SQL cluster only guarantees high availability of the BLOB references in the database when the provider is a FILESTREAM provider.  A separate solution, implemented at the scope of the BLOB store, must be implemented to ensure the availability of BLOB data.
·         Mirroring and Replication.  SQL mirroring and replication is supported for only the BLOB references in the database as long as BLOB data mirror processing can stay ahead of database (metadata) mirroring to ensure link level consistency.  BLOB data in the BLOB store is not mirrored or replicated.  Many BLOB store devices can provide mirroring and replication capabilities.  These features must be supported by the RBS provider in order for mirroring and replication of both the SQL data and the BLOB store data to be successful.  SQL mirroring can’t be used with SQL Server 2008 R2 if the LOCAL FILESTREAM provider is used.  Mirroring can be used for the client/content database, but not on the remote BLOB store database if REMOTE FILESTREAM provider OR a third party provider is used.



When to use RBS

  • You have huge content dbs for document archiving so you want to reduce storage cost (terabytes of data)
  • You have large media files to stream to the audience
  • You need to use RBS to integrate 3rd party storage/archiving solutions to SharePoint. (For example EMC Documentum)
While structured and semi-structured data can easily be stored in a relational database, the choice of where to store unstructured or BLOB data is more complicated. When deciding where to store BLOB data, consider the following requirements:
·         Performance: The way the data is going to be used is a critical factor. If streaming access is needed, storing the data inside a SQL Server database may be slower than storing it externally in a location such as the NTFS file system. Using file system storage, the data is read from the file and passed to the client application (either directly or with additional buffering). When the BLOB is stored in a SQL Server database, the data must first be read into SQL Server’s memory (the buffer pool) and then passed back out through a client connection to the client application. Not only does this mean the data goes through an extra processing step, it also means that SQL Server’s memory is unnecessarily “polluted” with BLOB data, which can cause further performance problems for SQL Server operations.
·         Security: Sensitive data that needs to have tightly-managed access can be stored in a database and security can be controlled using the usual SQL Server access controls. If the same data is stored in the file system, different methods of security such as access control lists (ACLs) need to be implemented.
·         Data size: Based on the research cited later in this white paper, BLOBs smaller than 256 kilobytes (KB) (such as widget icons) are better stored inside a database, and BLOBs larger than 1 megabyte (MB) are best stored outside the database. For those sized between 256 KB and 1 MB, the more efficient storage solution depends on the read vs. write ratio of the data, and the rate of “overwrite”. Storing BLOB data solely within the database (e.g., using thevarbinary(max) data type) is limited to 2 gigabytes (GB) per BLOB.
·         Client access: The protocol that the client uses to access SQL Server data, such as ODBC, may not be suited to applications such as streaming large video files. This may necessitate storing the data in the file system.
·         Transactional semantics: If the BLOB data has associated structured data that will be stored in the database, changes to the BLOB data will need to adhere to transactional semantics so the two sets of data remain synchronized. For example, if a transaction creates BLOB data and a row in a database table but then rolls back, the creation of the BLOB data should be rolled back as well as the creation of the table row. This can become very complex if the BLOB data is stored in the file system with no link to the database.
·         Data fragmentation: Frequent updates and overwrites will cause the BLOBs to move, either within the SQL Server database files or within the file system, depending on where the data is stored. In this case, if the BLOBs are large, then they may become fragmented (i.e., not stored in one contiguous part of the disk). This fragmentation can be more easily addressed by using the file system than by using SQL Server.
·         Manageability: A solution that uses multiple technologies that are not integrated will be more complex and costly to manage than an integrated solution.
·         Cost: The cost of the storage solution varies depending on the technology used.

The Case for RBS


When Storage needs to be Optimized


When BLOB data is allowed to inflate a SQL database, file I/O and processing load is increased on the database server.  If the average size of BLOB data is 80KB or higher, then implementing RBS reduces I/O and processing load which improves the performance of SQL Server.
Also, RBS providers have an additional advantage in that they can perform additional processing on the BLOB stream as it is being passed to the BLOB store.  For example, some providers may compress and decompress the BLOB content on the way to and from the BLOB store.  This results in a smaller overall storage footprint for the same BLOB data.  This practice should be carefully managed such that significant additional storage and retrieval latency is not added to BLOB store operations.

Large Database of Mostly Binary Data


If a given SQL Server database would grow to 500GB without RBS enabled, then RBS would be a beneficial option.  A 500GB database is considerably large.  Having a very large database can have a negative impact on business continuity and maintenance operations.
·         Backup and restore operations take considerably longer
·         Index and statistics defragmentation takes considerably longer.  This is a particular concern if the database must be taken offline during defragmentation.
·         Regular DBCC consistency checks will take much longer.  If database integrity is not regularly monitored, the risk of a corrupted database is considerably increased.  Larger databases will have a higher risk of corruption due to physical storage errors simply because of the large quantity of storage they consume.
For these reasons, enabling RBS on an otherwise very large database can be very beneficial as each of the concerns addressed above are alleviated.

Install and configure RBS (SharePoint Server 2010)

To use RBS, you must do the following:

 

Third party RBS providers

·         StoragePoint/EMC (Metalogix)

·         AvePoint also made their DocAve Extender free – a EBS provider for 2007, and a RBS provider for 2010

·         Quest

·         NetApps, Opentext, Commvault are also working on it.

·         STEALTH Software also offers RBS providers


References


External Storing of Binary Large Objects (BLOBs) in Windows SharePoint Services


Creating an External Storage Solution for SharePoint


FAQ: SharePoint 2010 Remote BLOB Storage (RBS)



SharePoint Information Architecture Diagram

Here is the template I use for Information Architecture designs. It's built using Mindjet and I flesh the nodes out with the low level d...