Thursday, January 30, 2020

Integrating SOLR with Adobe Experience Manager


How to integrate AEM and SOLR so that an AEM component can use SOLR to perform searches, as shown in the following illustration. 


















The following list compares developing an external search platform using Solr to Oak indexing. The following are benefits:

• Full control over the Solr document model
• Control over boosting specific fields in Solr document
• Real time indexing is within your control
• Comes handy when multiple heterogeneous systems are contributing for indexing.


Create project from scratch

·         Project creation using eclipse plugin.
·         Project creation using maven – archetype.
·         Deployment in AEM instance.

Create project using eclipse:

Download the eclipse from below location.

https://github.com/Adobe-Marketing-Cloud/aem-project-archetype

Properties to be remember while creating AEM Projects.


Create Project using maven archetype:

Archetype creates a minimal Adobe Experience Manager project as a starting point for your own projects. The properties that must be provided when using this archetype allow to name as desired all parts of this project.





Maven command to run the project:
mvn org.apache.maven.plugins:maven-archetype-plugin:2.4:generate -DarchetypeGroupId=com.adobe.granite.archetypes -DarchetypeArtifactId=aem-project-archetype -DarchetypeVersion=13 -DarchetypeCatalog=https://repo.adobe.com/nexus/content/groups/public/


Deployment in AEM instance

Provided Maven profiles

autoInstallBundle: Install core bundle with the maven-sling-plugin to the felix console.

autoInstallPackage: Install the ui.content and ui.apps content package with the content-package-maven-plugin to the package manager to default author instance on localhost, port 4502. Hostname and port can be changed with the aem.host and aem.port user defined properties.

autoInstallPackagePublish: Install the ui.content and ui.apps content package with the content-package-maven-plugin to the package manager to default publish instance on localhost, port 4503. Hostname and port can be changed with the aem.host and aem.port user defined properties.

Go to the project source parent folder, and execute one of the below command on command prompt:
1.      >mvn clean install

It will compile the full project but would not deploy on your instances.

Below files should be manually upload to package manager and install.
/ui.apps/target/***.zip
/ui.content/target/***.zip

2.      >mvn clean install -PautoInstallPackage -Padobe-public
Note: Check the pom.xml file for author port.

3.      >mvn clean install -PautoInstallBundle
Auto install the bundle only.

4.      >mvn clean install -PautoInstallPackagePublish -Padobe-public
Note: Check the pom.xml file for author port.

Archetype provides the following modules –>

Core : Core bundle (java code goes here)
it.launcher - Baseline bundle to support Integration Tests with AEM
it.test - Integrations tests
ui.apps - Module for your components,template etc code.
ui.content - Project sample/test content or may be actual content (actual content in codebase is not a good practice)

Notes:
In short, Archetype is a Maven project templating toolkit. An archetype is defined as an original pattern or model from which all other things of the same kind are made.
POM is fundamental unit of maven which resides in the root directory of your project and contains the information about the project and various configuration details used by maven to build the project. So before creating a project , we should first decide the project group (groupId), artifactId and its version as these attributes help in uniquely identifying the project in maven repository.



Create Solr_search component.




Create Java files and what is the use each one for implementing solr-facet search in aem.



go to core.

SolrSearchService (a Java interface)



SOLRSEARCHSERVICE INTERFACE


The SolrSearchService interface describes the operations exposed by this service. The following Java code represents this interface.

SOLRSERVERCONFIGURATION INTERFACE


The SolorServerConfiguration interface descibes operations exposed by this configuration service. The following Java code represents this interface. 

Creating a workflow for indexing.








After activating the page then Custom process step calls the index .



Build the OSGi bundle using Maven



To build the OSGi bundle by using Maven, perform these steps:
  1. Open the command prompt and go to aem-solr-article.
  2. Run the following maven command: mvn clean install.
  3. The OSGi component can be found in the following folder: aem-solr-article\core\target. The file name of the OSGi component is solr.core-1.0-SNAPSHOT.jar.
  4. Install the OSGi using the Felix console.

Setup the Solr Server


Download and install the Solr server (solr-6.2.0.zip ) from the following URL:
then create a new core.


Configure AEM to use Solr Server


Configure AEM to use Solr server. Go to the following URL:
Search for AEM Solr Search - Solr Configuration Service and enter the following values:
  • Protocol - http
  • Solr Server name - localhost
  • Solr Server Port: 8983
  • Solr Core Name - collection (references the collection you created)


Select core from the drop-down control and select Query. Then click the Execute button. If successful, you will see the result set.



View Solr results in an AEM component


In CRXDE lite, open the following HTML file:
/apps/solr/components/content/solrsearch/solrsearch.html
and add the following script at the top of the page:
<script src="https://code.jquery.com/jquery-3.1.0.js" integrity="sha256-slogkvB1K3VOkzAI8QITxV3VzpOnkeNVsKvtkYLMjfk=" crossorigin="anonymous"></script>  
To access the component that display Solr values, enter the following URL: 





Indexing JSON data


Let’s see how we can index the JSON data in Solr




We need to follow 2 steps

1)Define the description of the fields of the new JSON data
2)Publish the data to Solr.

Let’s use books.json file provided by solr itself for indexing the JSON data.


books.json is available inside solr-6.2.0\example\exampledocs

Let’s add below fields description in the schema.xml file(solr-6.2.0\server\solr\MyCore\conf) after the < uniqueKey>id tag

<!-- Fields added for indexing books.json file-->
 <field name="cat" type="text_general" indexed="true" stored="true"/>
 <field name="name" type="text_general" indexed="true" stored="true"/>
 <field name="price" type="tdouble" indexed="true" stored="true"/>
 <field name="inStock" type="boolean" indexed="true" stored="true"/>
 <field name="author" type="text_general" indexed="true" stored="true"/>

If we observe the fields in the books.json file, we can see that 10 fields are available inside this file.But we have provided only 5 fields description in the schema.xml file.
What happens to other fields? Will they be indexed?
Yes, the other fields will also be indexed but how?
The id field in the books.json file will be taken care by the uniqueKey element of schema.xml file for indexing < uniqueKey>id< /uniqueKey>
The other 4 fields will also be indexed using the dynamicField tag in the schema.xml.

Now let’s post the data to the Solr to index it

Lets navigate to the below path in command prompt solr-6.2.0\example\exampledocs
run the below command
java -Dtype=text/json -Durl=http://localhost:8983/solr/MyCore/update -jar post.jar  books.json
Since it’s a java command, we can pass run time arguments using –D We are passing 2 java run time arguments here

–Dtype – Specifyies the type of the file like CSV,XML,JSON etc, we are passing JSON as our publishing data is of JSON type.
-Durl -> URL of the Core under which indexing has to happen
We can see that Solr server has indexed the file and committed the indexed data in MyCore and displayed the following output in command prompt.
Access the below url now and check the statistics of indexed data
http://localhost:8983/solr/#/MyCore
We can observe that Num Docs displays no of records which are indexed.
Since we have 4 records in books.json file, all these records are indexed and hence Num Docs displays 4.
Access indexed data We can access the indexed data directly in the Admin console of Solr without any condition.
Access the below url now http://localhost:8983/solr/#/MyCore
Select MyCore and click on query option
Now click on execute query


Creating new Solr Core

What is Solr Core ?


Solr core is basically an index of the text and fields found in documents that we publish to Solr.
single Solr instance can contain multiple cores, which are separate from each other based on local criteria.
In my project, If i have 2 databases and I may decide to use separate core for each database.
Each core will have its own configuration and indexed data storage location.
When the Solr server runs in a standalone mode, this configuration is called Core
When the solr server runs in a Cloud mode, this configuration is called Collection

This core folder will contain all the configurations and indexed data.

Let’s create a Core configuration for Standalone server

We can create the Core using the solr create command
Navigate to the solr-6.2.0\bin folder in command prompt
And run the below command
solr create –c MyCore
Now navigate to the solr-6.2.0\server\solr
We can see MyCore folder is created and this folder can be used as configuration folder for our indexing.

We can see 2 folders called conf(used for configuration) and data(used for storing indexed data) folders and core.properties file which contains the name of the core inside the MyCore folder.


Lets Understand Core concept and make some changes in MyCore

Every core needs to have 4 important files listed below
1)solr.xml
2)solrconfig.xml
3)schema.xml
4)core.properties
1)solr.xml

This file will be in the same place where our new MyCore folder is located.
This file should be used to configure the Solr Cores.
2)solrconfig.xml

This file will be automatically created when we run Solr Create command and available inside MyCore/conf directory.
This file is used to configure the Solr server in a high level
For example, we can change the location of data directory in this file and Lucene details are added in this file.

At the end of this file before < /config > tag add the below line

<schemaFactory class="ClassicIndexSchemaFactory"/>
This enables to use Schema mode where we can use schema.xml for manually editing the schema xml file for defining our own field types description.

3)schema.xml

The file will be generated as managed-schema.xml when we create a new core as by default Solr uses Schemaless mode.
Managed-schema.xml is available inside MyCore/conf directory.
Rename this file to schema.xml as we have done the configuration change in solrconfig.xml to use Schema mode
.
Managed-schema.xml should be used if we are using schemaless mode which is by default enabled in Solr.
This file contains the description of the fields that we pass to solr for indexing.
4)core.properties

This file will be automatically created when we run Solr Create command and available inside MyCore directory.
This file is used to define the properties used by our Core like Core Name,solr config,schema file etc
If we don’t add any values inside this core.properties ,default values will be taken automatically.

Since we have changed some of the configuration details, restart the Solr server.

Navigate to below path in command prompt
solr-6.2.0\bin
Stop the Solr server using below command
Solr stop –all
Start the Solr server now using below command
Solr start
Let’s access the solr through web using below url and check whether core is listed or not in the core selector

What is Indexing?


Indexing is the process of arranging the data in a more systematic and efficient way to locate the information in a document in a much faster way.


Let’s understand Indexing with the below examples


Example 1

Assume that I have the following table in the DB which stores Person’s details
Person Table
If we want to fetch the records whose last name is Dravid then It scans each and every row to match for the last name as Dravid , if matches it will add that to the result set.
This requires to go through each and every row even though that row will have a different lastname than Dravid.
Don’t you think it takes more time as it has to scan though unwanted rows ?
Yes it will take more time for sure as it is going through each and every row.
Now Observe the below data

Here we have arranged the table data in the order of last name in alphabetical order
Now when we search for a last name as Dravid we can identify the right row based on the alphabetical order and then get the result accordingly.
Here it is not required to go through all the rows to search Dravid because we know that LastName is arranged in alphabetical order.
It has improved the performance.
If we have 100000 rows, then it improves the performance drastically while searching.

Example 2

Another example would be Text Books which has index in it.
Assume we want to search the Chapter called Brave Man and if we don’t have index defined at the beginning of the book then it is very difficult to search that chapter and also it takes more time.
If we have index defined like below
ChapterName :page number
Then we can easily search the Chapter name and get the page number from the index using which we can easily open the chapter in the Book without much time.
So in both the cases, with Indexing we are increasing the performance of Searching.
This is what actually needed by many websites especially ecommerce sites where people do searching a lot.
This way of representing the data in a more efficient way to make the search faster is called Indexing
There are many frameworks available in the market which helps to achieve indexing and also provides lot more features along with indexing like faceted navigation,Hit Highlighting,caching etc.
Some of such frameworks available in the market are Solr,Sphinx,elastic search,Algolia ,Swiftype etc.
Each framework will have their own way of indexing the data published to it.
Solr is the most widely used open source Search server with a current run-rate of over 6,000 downloads a day and installed at 4000 companies as per the Solr wiki statistics at the time of posting it
check the below link to read the same