SharePoint – Export SharePoint Documents And Metadata Using Power Shell Script
Recently one of our client wanted to move out of SharePoint and the customer have a Sharepoint farm with 800 web sites. We have asked to extract all content from a SharePoint site with their metadata for archival. They need all documents to be download to a folder structure as it is in the folder hierarchy in the site.They would also like to have a CSV file for each document library, which will contain all metadata of each document including some other important data of the document .e.g. Site Url, Document Path after downloaded, Mime Type and Checksum of each document. Customer will use this information to migrate the contents to destination which is implemented using different technology. So we have used PowerShell to address the customer’s requirement.
After googling some time I have come across a nice blog written by Nik Craik .We have used the script from this blog as our base script, then modified and added new features as per our requirement. Along with the required features, we have included logging feature. The log file provides details like, number of documents in a site and number of document exported from that site. It also log the details of the error if there is any during the export.
In our case we have used a csv file, which will provide all required inputs to the script. The file contains list of site urls, need to be export and a holder name for each site so that a folder will be create as per holder name and the document will be download to their respected holder name folder.
In this script we have, restrict the script so that it will only download documents from the document library which template id is 101 and avoid all other document library. You can modify the script as per your requirement.
I put detailed comments in the script, to make it self explanatory.
<#
.SYNOPSIS
Export the metadata of all document to a CSV file. We can provide multiple site url so
that at a time we can export documents from multiple sites.
.PARAMETER InputFilepath
The File path is a path where the csv file exist. this file contains all the site url and holder name.
This file will provide the inputs to the script
e.g "C:\M\Script\Site_Details.csv"
.PARAMETER Destination
The location path,where the script will Download files and its metadata .
e.g "C:\M\Export"
#>
param
(
[parameter(Mandatory=$true)][string]$InputFilepath,
[parameter(Mandatory=$true)][string]$Destination
)
#Check and add PowerShell snap in
if(-not(Get-PSSnapin | Where { $_.Name -eq "Microsoft.SharePoint.PowerShell"}))
{
Add-PSSnapin Microsoft.SharePoint.PowerShell -erroraction SilentlyContinue;
}
#The File path is a path where the log file will be create.
#e.g "C:\M\Export\logs.txt"
$logPath = ''
#Create a new folder under the site destination folder for document where the actual folder structure is very deep.
# So that some time,the long url issue can be avoied.
$longFolderDocumentPath="\Long Path Docs\"
$FinalDestination=''
# variable to hold total number of document in the provided site
[int]$SiteDocumentCount= 0
[int]$DownloadedDocCount=0
#Export site by taking both input
ExportSite $InputFilepath $Destination
# The initial method to start exporting both site document and its metadata
#function ExportSite($site,$destination)
#$filepath : the path of csv file
#$Defaultdestination : location on file system , to save the exported files
function ExportSite($filepath,$Defaultdestination)
{
$logPath =$Defaultdestination+'\logs.txt'
try
{
#In this csv we have used ';' as delimeter because , few site url contains ',' so it breaks the coad
$createList = Import-Csv -path $filepath -Delimiter ';'
}
catch
{
write-host "Import-Csv : Could not find the site details csv file at provided path " $filepath
Write "Import-Csv : Could not find the site details csv file at provided path ::$( $filepath)" >>$logPath
Write "Error:: "$_.Exception.Message >>$logPath
return
}
#iterating through each site url
ForEach($row in $createList)
{
#getting each site url
$SiteUrl = $row.SiteUrl
if($SiteUrl.length -gt 0)
{
#Getting holder name of the site if exists
$currFolderPath=$row.HolderName
# check if holder name is not exist then it will set the destination path to root folder
#other wise it will create a folder with name of the holder name for the site under the root folder and
#set it as destination folder
if($currFolderPath.Length -gt 0)
{
$FinalDestination=$Defaultdestination+'\'+$currFolderPath
$currFolderPath=''
}
else
{
$FinalDestination=$Defaultdestination
}
try
{
$exportStartTime = Get-Date -Format "hh:mm:ss"
Write "------------------------------------------------------------------------" >>$logPath
Write "$($exportStartTime): : The Export started for the site : $($SiteUrl) " >>$logPath
#Download Site Documents + Versions
Download_DocumentLibraries $SiteUrl $FinalDestination
#Export Site Lists and Document Library Metadata
Export_Metadata $SiteUrl $FinalDestination
}
catch
{
write-host " Issue with accessing the site : " $SiteUrl
Write " Issue with accessing the site : : $($SiteUrl)" >>$logPath
Write "Error: "$_.Exception.Message >>$logPath
return
}
$exportEndTime = Get-Date -Format "hh:mm:ss"
Write "$($exportEndTime) : : The Export ended for the site : $($SiteUrl)" >>$logPath
Write "-------------------------------------------------------------------" >>$logPath
}
else
{
continue
}
}
}
#Function :Export_Metadata
#Description : Export_Metadata function collect all required metadata and put them in a csv file and save the file in provided location.
#sourceweb: document library's parent Sit url, for which the metadata need to be export.
#metadatadestination: location where the metadata csv file will be stored
function Export_Metadata($sourceweb, $metadatadestination)
{
try
{
Write-Host "Creating Lists and Metadata"
$sourceSPweb = Get-SPWeb -Identity $sourceweb
}
catch
{
Write-Host "Cannot find a SPWeb object with provided Id or Url : " $sourceweb
Write "Cannot find an SPWeb object with provided Id or Url : " $sourceweb >>$logPath
Write "Error: "$_.Exception.Message >>$logPath
return
}
#Create a folder using the site's name to contains the documents and metadata of a site
$siteDocFolder = $metadatadestination + "\" +$sourceSPweb.Title +"\Document Libraries"
$metadataFolder = $metadatadestination+"\"+$sourceSPweb.Title+"\Document Metadatas"
$createMetaDataFolder = New-Item $metadataFolder -type directory
$metadatadestination = $metadataFolder
#as per our requirement we have restrict the document library to custom document library only
$mylistsToIgnore = @(101)
$listArray = @()
# iterate through all document library
foreach($list in $sourceSPweb.Lists)
{
$ListItemCollection =@()
$listTemplate = $sourceSPweb.ListTemplates | ? {$_.FeatureId -eq $list.TemplateFeatureId}
if ($mylistsToIgnore.Contains($listTemplate.Type_Client) -and $list.Hidden -eq $False -and !$list.DefaultViewUrl.Contains('/FormServerTemplates/') -and !$list.DefaultViewUrl.Contains('/SiteCollectionDocuments/') -and !$list.DefaultViewUrl.Contains('/Style Library/') -and !$list.DefaultViewUrl.Contains('/SiteAssets/') )
{
$ListItems = $list.Items
if($ListItems.Count -gt 0)
{
$Listlocation = $metadatadestination+"\"+$list.Title+".csv"
Write-Host "Exporting List MetaData: " $list.Title
#Get All List items where Status is "In Progress"
try
{
$list.Items | foreach {
try{
#Construct Document Folder Path \url where the document already downloaded
$FolderPath=$_['FileDirRef']
if($FolderPath.length -gt 0 )
{
#$FolderPath=$FolderPath -replace $sourceSPweb.ServerRelativeUrl,''
$FolderPath=$FolderPath.Replace($sourceSPweb.ServerRelativeUrl,'')
$pathArray=$FolderPath.split('/',[System.StringSplitOptions]::RemoveEmptyEntries)
$Folderpath =$siteDocFolder
for($i=0; $i -le $pathArray.length; $i++)
{
$Folderpath += '\'+ $pathArray[$i]
}
}
$DocumentPath= $Folderpath + $_['Name']
$DocumentPath=$DocumentPath.Replace('\','/')
if($DocumentPath.length -ge 250)
{
$DocumentPath= $siteDocFolder + $longFolderDocumentPath + $_['Name']
$DocumentPath=$DocumentPath.Replace('\','/')
}
# End of Construct Document Folder Path\url
#Add Site Url, Document path, Mime type, Checksum as a column to the output csv file as document metadata
try
{
$myMimeType = $(Get-MimeType -CheckFile $DocumentPath)
$checkSum = $(Get-Checksum -CheckFile $DocumentPath)
}
catch
{
Write-Host "Document not found at : "$DocumentPath
Write "Document not found at : "$DocumentPath >>$logPath
Write "Error: "$_.Exception.Message >>$logPath
$myMimeType=''
$checkSum=''
}
$ExportItem = New-Object PSObject
$ExportItem | Add-Member -MemberType NoteProperty -name "Site URL" -value $sourceSPweb.Url
$ExportItem | Add-Member -MemberType NoteProperty -name "Site Name" -value $sourceSPweb.Title
$ExportItem | Add-Member -MemberType NoteProperty -name "Document Path" -value $DocumentPath
$ExportItem | Add-Member -MemberType NoteProperty -name "Mime Type" -value $myMimeType
$ExportItem | Add-Member -MemberType NoteProperty -name "Checksum" -value $checkSum
$myMimeType=''
$checkSum=''
#Get Each field
foreach($Field in $_.Fields)
{
try{
# uncomment the following line if we dont need the metadata of hidden field
#if(!$Field.Hidden)
#{
$fieldValue =$_[$Field.InternalName]
if($Field.InternalName -eq 'Modified_x0020_By' -or $Field.InternalName -eq 'Created_x0020_By')
{
if($fieldValue.length -gt 0 )
{
$fieldValue=$fieldValue.Replace('\','/')
}
$ExportItem | Add-Member -MemberType NoteProperty -name $Field.Title -value $fieldValue -Force
}
elseif($Field.Title -eq $null -or $Field.InternalName -eq 'LinkFilenameNoMenu' -or $Field.InternalName -eq 'LinkFilename' )
{
if($Field.InternalName -eq 'Author' -or $Field.InternalName -eq 'Editor' -or $Field.InternalName -eq 'Owner')
{
$userValue=$fieldValue
if($userValue.length -gt 0 )
{
$userValueArray=$userValue.split('#',[System.StringSplitOptions]::RemoveEmptyEntries)
if($userValueArray.length -gt 0 )
{
$User=$userValueArray[1]
$ExportItem | Add-Member -MemberType NoteProperty -name $Field.InternalName -value $User -Force
}
}
else
{
$ExportItem | Add-Member -MemberType NoteProperty -name $Field.InternalName -value '' -Force
}
}
else
{
$ExportItem | Add-Member -MemberType NoteProperty -name $Field.InternalName -value $fieldValue -Force
}
}
else
{
if($Field.InternalName -eq 'Author' -or $Field.InternalName -eq 'Editor' -or $Field.InternalName -eq 'Owner')
{
$userValue=$fieldValue
if($userValue.length -gt 0 )
{
$userValueArray=$userValue.split('#',[System.StringSplitOptions]::RemoveEmptyEntries)
if($userValueArray.length -gt 0 )
{
$User=$userValueArray[1]
$ExportItem | Add-Member -MemberType NoteProperty -name $Field.Title -value $User -Force
}
}
else
{
$ExportItem | Add-Member -MemberType NoteProperty -name $Field.InternalName -value '' -Force
}
}
else
{
$ExportItem | Add-Member -MemberType NoteProperty -name $Field.Title -value $fieldValue -Force
}
}
#} # hidden fields
}
catch
{
Write-Host "Error to get metadata value of field:" $Field.Title "of current item"
Write " WARN : Error to get metadata value of field:" $Field.Title "of current item">>$logPath
Write "Error: "$_.Exception.Message >>$logPath
continue
}
}
#Add the object with property to an Array
$ListItemCollection += $ExportItem
}
catch
{
Write-Host "Error to get metadata of document : " $_['FileDirRef'] "of current document Library"
Write "WARN : Error to get metadata of document:" $_['FileDirRef'] >>$logPath
Write "Error: "$_.Exception.Message >>$logPath
continue
}
}
}
catch
{
Write "Error: "$_.Exception.Message >>$logPath
Write-Host "Error: "$_.Exception.Message
}
#Export the result Array to CSV file
$ListItemCollection | Export-CSV $Listlocation -NoTypeInformation -Encoding UTF8
Write-host "Library Metadata Exported to $($Listlocation) for site $($list.Title)"
Write "Library Metadata Exported to $($Listlocation) for site $($list.Title)">>$logPath
}
}
}
}
# Function: Download_DocumentLibraries
# Description: Calls DownloadDocuments recursiveley to download all document libraries in a site.
# Variables
# $webUrl: The URL of the site to download all document libraries
function Download_DocumentLibraries($webUrl,$documentDestination)
{
try
{
Write-Host "Downloading Documents from site :" $webUrl
Write "Downloading Documents from site :" $webUrl >>$logPath
$web = Get-SPWeb -Identity $webUrl #-ErrorAction Stop
}
catch
{
Write-Host "Cannot find an SPWeb object with provided Id or Url : " $web
Write "Cannot find an SPWeb object with provided Id or Url : " $web >>$logPath
Write "Error: "$_.Exception.Message >>$logPath
return
}
#Create a folder using the site's name
$siteFolder = $documentDestination + "\" +$web.Title+"\Document Libraries"
$createSiteFolder = New-Item $siteFolder -type directory
$destination = $siteFolder
$mylistsToIgnore = @(101) # This will filter only those document library which templare id is 101.
$global:SiteDocumentCount=0 # it will set the number of documentSet Counter to 0 so that finally it will show the number of documents downloaded
$global:DownloadedDocCount=0
foreach($list in $web.Lists)
{
try
{
$listTemplate = $web.ListTemplates | ? {$_.FeatureId -eq $list.TemplateFeatureId}
if($listTemplate.Type_Client -eq 101)
{
if ($mylistsToIgnore.Contains($listTemplate.Type_Client) -and $list.Hidden -eq $False -and !$list.DefaultViewUrl.Contains('/FormServerTemplates/') -and !$list.DefaultViewUrl.Contains('/SiteCollectionDocuments/') -and !$list.DefaultViewUrl.Contains('/Style Library/') -and !$list.DefaultViewUrl.Contains('/SiteAssets/'))
{
Write-Host "Downloading Documenfts from " $list.Title " Library"
# Write-host "The number of documents in "$list.Title " Library is : " $list.Items.Count
Write "Downloading Documenfts from $($list.Title) Library" >> $logPath
$global:SiteDocumentCount= $global:SiteDocumentCount+$list.Items.Count
$listUrl = $web.Url +"/"+ $list.RootFolder.Url
#Download root files
DownloadDocuments $list.RootFolder.Url
#Download files in folders
foreach ($folder in $list.Folders)
{
DownloadDocuments $folder.Url
}
# Write-host "The number of documents downloaded from $($list.Title) Library is : $($global:DocumentCount)"
}
} #end if
}
catch
{
Write-Host "Issue with access the library : " $list.DefaultViewUrl
Write "Issue with access the library : " $list.DefaultViewUrl >>$logPath
Write "Error: "$_.Exception.Message >>$logPath
}
}
Write "********************************************************************************************" >>$logPath
Write "The number of documents in $($web.Title) Site is : $($global:SiteDocumentCount)" >>$logPath
Write "The number of documents downloaded from $($web.Title) Site is : $($global:DownloadedDocCount)" >>$logPath
Write "********************************************************************************************" >>$logPath
}
# Function: DownloadDocuments
# Description: Downloads a document library's files; called GetGileVersions to download versions.
# Variables
# $folderUrl: The Document Library to Download
function DownloadDocuments($folderUrl)
{
$folder = $web.GetFolder($folderUrl)
foreach ($file in $folder.Files)
{
try
{
#Ensure destination directory
$destinationfolder = $destination + "\" + $folder.Url
$currnetfilepath= $destinationfolder + "\" + $file.Name
$shortPath=$destination +$longFolderDocumentPath
# Try to create a new folder with name "Long path Doc" if current document folder path is greater than 260
if($currnetfilepath.length -ge 250)
{
#Create the folder in root folder instade of following the actual folder structure
if (!(Test-Path -path $shortPath))
{
$dest1 = New-Item $shortPath -type directory
}
}
else
{
if (!(Test-Path -path $destinationfolder))
{
$dest = New-Item $destinationfolder -type directory
}
}
#download file in Long path docs folder if path excced teh limit length
if($currnetfilepath.length -ge 250)
{
$currnetfilepath= $shortPath + $file.Name
}
#Download file
$binary = $file.OpenBinary()
$stream = New-Object System.IO.FileStream($currnetfilepath), Create
$writer = New-Object System.IO.BinaryWriter($stream)
$writer.write($binary)
$writer.Close()
$global:DownloadedDocCount++
#Download file versions. If you don't need versions, comment the line below.
GetVersionFiles $file
}
catch
{
Write-Host "Cannot download the document with the Url : " $file.ServerRelativeUrl "from " $file.Web.ServerRelativeUrl "Site"
Write "WARN : Cannot download the document with the Url : " $file.ServerRelativeUrl "from " $file.Web.ServerRelativeUrl "Site" >>$logPath
Write "Error: "$_.Exception.Message >>$logPath
}
}
}
# Function: GetVersionFiles
# Description: Downloads all versions of every file in a document library
# Variables
# $WebURL: The URL of the website that contains the document library
# $DocLibURL: The location of the document Library in the site
# $DownloadLocation: The path to download the files to
function GetVersionFiles($file)
{
foreach($version in $file.Versions)
{
try
{
#Add version label to file in format: [Filename]_v[version#].[extension]
if($file.Name.length -gt 0)
{
$filesplit = $file.Name.split(".")
$fullname = $filesplit[0]
$fileext = $filesplit[1]
$FullFileName = $fullname+"_v"+$version.VersionLabel+"."+$fileext
#Can't create an SPFile object from historical versions, but CAN download via HTTP
#Create the full File URL using the Website URL and version's URL
$fileURL = $webUrl+"/"+$version.Url
#Full Download path including filename
$DownloadPath = $destinationfolder+"\"+$FullFileName
#Download the file from the version's URL, download to the $DownloadPath location
HTTPDownloadFile "$fileURL" "$DownloadPath"
}
}
catch
{
Write-Host "Cannot download the document for version : " $version.VersionLabel
Write "WARN :Cannot download the document of version : " $version.VersionLabel "for :" $fileURL >>$logPath
Write "Error: "$_.Exception.Message >>$logPath
}
}
}
# Function: HTTPDownloadFile
# Description: Downloads a file using webclient
# Variablesis
# $ServerFileLocation: Where the source file is located on the web
# $DownloadPath: The destination to download to
function HTTPDownloadFile($ServerFileLocation, $DownloadPath)
{
try
{
$webclient = New-Object System.Net.WebClient
$webClient.UseDefaultCredentials = $true
$webclient.DownloadFile($ServerFileLocation,$DownloadPath)
}
catch
{
Write " WARN : Cannot download the document for version : " $version.VersionLabel "for file :" $ServerFileLocation >>$logPath
Write "Error: "$_.Exception.Message >>$logPath
}
}
#Getting Mime Type of the document
function Get-MimeType()
{
param([parameter(Mandatory=$true, ValueFromPipeline=$true)][ValidateNotNullorEmpty()][System.IO.FileInfo]$CheckFile)
begin {
Add-Type -AssemblyName "System.Web"
[System.IO.FileInfo]$check_file = $CheckFile
[sting]$mime_type = $null
}
process {
if ($check_file.Exists) {
$mime_type = [System.Web.MimeMapping]::GetMimeMapping($check_file.FullName)
}
else {
$mime_type = "false"
}
}
end { return $mime_type }
}
#Getting Checksum of the document
function Get-Checksum()
{
param([parameter(Mandatory=$true, ValueFromPipeline=$true)][ValidateNotNullorEmpty()][System.IO.FileInfo]$CheckFile)
begin {
Add-Type -AssemblyName "System.Web"
[System.IO.FileInfo]$check_file = $CheckFile
[string]$checksum = $null
}
process {
if ($check_file.Exists) {
$checksum = Get-FileHash -LiteralPath ($check_file.FullName) -Algorithm SHA256
}
else {
$checksum = "false"
}
}
end { return $checksum.Split(";")[1].Split("=")[1] }
}
Thanks for Reading 🙂
Keep reading, share your thoughts, experiences. Feel free to contact us to discuss more. If you have any suggestion / feedback / doubt, you are most welcome.
Stay tuned on Knowledge-Junction, will come up with more such articles.
